SPATIAL DATA WITH PYTHON







SPATIAL DATA IN PYTHON

Raster vs Vector

There are two ways of storing geospatial data: raster or vector.

The data in a raster format is in a table (rows and columns) where each cell (also called pixels) contains information on the area it represents. Most of the time, the cells are square-shaped and regularly spaced. The information in each cell can be a color, an altitude, a temperature, or any other indicator that the file’s author wants to convey.

On the other hand, vector files represent geographical entities from three basic elements: points, lines, and polygons (areas). In turn, each geographical entity/object can store additional elements. For example, if each entity is a country, then the country’s name, population size, official language, etc. could be stored.

raster_vs_vector.png

Raster Files

NASA (through its Visible Earth project) offers tons of images and data of our planet.
Thanks to them, we can download an image in raster format from this link (2MB — TIF).
The downloaded file has a .tif extension. Keep in mind that there are many other raster files formats.
Some of the more popular ones include Esri Grid, JPEG 2000, MrSID, and ECW.

earth_raster.png

Vector Files

The shapefile format is the most popular GIS file format. However, it’s not the only one. You may see other file formats such as GeoJSON, KML, KMZ, and even CSV. We’ll use the geopandas library from Python to read and analyze the vector files.

Coordinate Reference System (CRS)

The Coordinate Reference System (CRS) tells us how the locations indicated in the raster or vector file correspond with Earth.

Furthermore, it establishes which technique must be used to “flatten” or “project” the Earth into two dimensions. It’s a somewhat complex subject, but keep it in mind to avoid some headaches.

In [2]:
print('hola')
hola