Using GeoPandas for analysis of geographic datasets

In this example we will use geopandas to create a heat map of the Mexican area represented by its population.

Jose Luis Cardenas
4 min readSep 28, 2021

In the day-to-day work of data science we often have to work with geographic datasets.
These are datasets that, along with other properties, contain geographic columns, which express a place generally in two dimensions (point, line, polygon, multi-point, multi-line, etc.)
A widely used format for these datasets is GeoJSON.

GeoJSON is an open JSON-like format that can express geographic along with its non-geographic properties.

For this example we will use two datasets:
- D1 States of Mexico: Containing the geographies and names of the states of Mexico
- D2 Cities of Mexico: Dataset that contains the population of each city, its name and its point on the map

We will generate a heat map of population by state, intersecting all populations by city with their respective state.

The most common operations when working with geographic datasets are intersection, union, difference and distance calculations.
For these operations one of the most used libraries is GeoPandas. In this example we will use GeoPandas with Python:

Setup

import geopandas

Extracting the first dataset:

import geopandasgeo_df = geopandas.read_file("https://gist.githubusercontent.com/walkerke/76cb8cc5f949432f9555/raw/363c297ce82a4dcb9bdf003d82aa4f64bc695cf1/mx.geojson")geo_df.head()

In this first dataset, the interesting columns would be name and their geometry. We have in it the 32 states of Mexico

Geopandas gives us an easy way to graph geodatasets, obtaining a professional map for analysis:

geo_df.plot()

Extracting the second dataset:

import pandas as pd
import io
import requests
data = requests.get("https://simplemaps.com/static/data/country-cities/mx/mx.csv")df = pd.read_csv(io.StringIO(data.content.decode('utf-8')))df.head()

In this second dataset we have the cities of Mexico and each of its populations. In this case we do not have a column with geometries, but we have a latitude and longitude field that we can convert to a point on the map.

Convert to geography

We can create a geometric field with geopandas in a simple way. To this field we assign the projection 4326

A projection is a system of geographic representation that establishes a relationship between the coordinates and the surface of the earth.

geo_df2 = geopandas.GeoDataFrame(df, geometry=geopandas.points_from_xy(df.lng, df.lat), crs=”EPSG:4326")geo_df2.head()

In the dataset we now see the geometry field.

Intersecting

We intersect the points of cities with the geometries of the states. Geopandas contain useful methods for these types of operations, such as the overlay method.

In this case we use the intersection:

intersect = geopandas.overlay(geo_df2, geo_df, how='intersection')

Obtaining an intersection dataset. Now for each city line we have the intersecting fields of its state. However we want a dataset grouped by state.

For this we group by adding the populations of the different cities:

grouped = intersect.groupby(['id','state'])['population'].agg('sum')

We create the merge with the original dataset to assign its geometric field:

merge = pd.merge(geo_df, grouped, on="id")

The result

Finally we graph to visualize the result with the plot tool provided by geopandas and matplotlib:

Obtaining a professional heat map of the states of Mexico and their population!

Conclusion

Python and geopandas are very useful analyzing not only numerical and categorical data but also geography, enriching the analysis by adding the spatial dimension.

Jupyter Notebook:

Thanks for reading, have a nice day :)

--

--

No responses yet