How to Join Attribute Data to a GeoDataFrame in Python

Problem statement

A common GIS task is attaching non-spatial attribute data to existing spatial features. For example, you may have:

  • a shapefile or GeoJSON of district boundaries loaded into a GeoDataFrame
  • a CSV file with population, land use, or survey results
  • a shared ID field such as district_id

In GeoPandas, this is an attribute join, not a spatial join. You are matching rows by a common column, not by location or geometry overlap.

This page shows the typical GeoPandas workflow for an attribute join: reading spatial data, loading a CSV, checking the join key, merging the tables, verifying the result, and exporting the joined layer.

Quick answer

Use GeoDataFrame.merge() on a shared key column.

The safest pattern is to keep the GeoDataFrame on the left side of the join:

joined = gdf.merge(df, on="district_id", how="left")

This preserves the geometry column and returns a GeoDataFrame. In most GIS workflows:

  • use how="left" to keep all spatial features
  • use how="inner" to keep only matched features

Step-by-step solution

Load the spatial dataset into a GeoDataFrame

Read your shapefile or GeoJSON with GeoPandas and inspect the key field.

import geopandas as gpd

gdf = gpd.read_file("data/district_boundaries.shp")

print(gdf.head())
print(gdf.columns)
print(gdf.geometry.name)
print(len(gdf))

Check that:

  • the expected join column exists, such as district_id
  • the geometry column is present
  • the dataset loaded as a GeoDataFrame
print(type(gdf))
print(gdf[["district_id", "geometry"]].head())

Load the attribute table

Read the non-spatial table with pandas.

import pandas as pd

df = pd.read_csv("data/district_population.csv")

print(df.head())
print(df.columns)
print(len(df))

For a typical GIS attribute join, keep only the columns you need:

df = df[["district_id", "population_2024", "households"]]

This helps avoid unnecessary duplicate fields in the output.

Check that the join key matches in both tables

Before joining, verify that the key field has the same name and compatible data type in both tables.

print(gdf["district_id"].dtype)
print(df["district_id"].dtype)

A common problem is one table storing IDs as integers and the other as strings. Another common issue is extra whitespace.

Clean both sides if needed:

gdf["district_id"] = gdf["district_id"].astype(str).str.strip()
df["district_id"] = df["district_id"].astype(str).str.strip()

Also check for duplicates in the attribute table:

duplicate_keys = df[df["district_id"].duplicated(keep=False)]
print(duplicate_keys)

If the CSV should contain one row per district, duplicates need to be fixed before merging.

Join the attribute table to the GeoDataFrame

Use a column-based merge. A left join is usually the right default for GIS because it keeps all spatial features.

joined = gdf.merge(df, on="district_id", how="left")

If you only want features with matching attribute records, use an inner join:

matched_only = gdf.merge(df, on="district_id", how="inner")

To join on columns with different names:

joined = gdf.merge(
    df,
    left_on="district_id",
    right_on="DIST_ID",
    how="left"
)

Verify the result

Check the output after the merge.

print(type(joined))
print(len(joined))
print(joined.columns)
print(joined[["district_id", "population_2024", "geometry"]].head())

You want to confirm:

  • the expected attribute columns were added
  • the geometry column still exists
  • the row count makes sense

You can also check how many spatial features did not get a match:

missing = joined["population_2024"].isna().sum()
print(f"Rows with no matching population record: {missing}")

Save the joined output

Export the result to a GIS format.

joined.to_file("output/district_population.geojson", driver="GeoJSON")

For GeoPackage:

joined.to_file("output/district_population.gpkg", layer="districts", driver="GPKG")

For shapefile:

joined.to_file("output/district_population.shp")

If possible, prefer GeoPackage over shapefile because shapefiles have stricter field name and data type limits.

Code examples

Example 1: Join a CSV of population data to polygon boundaries

This example shows a common pattern for joining a CSV to a GeoDataFrame in Python.

import geopandas as gpd
import pandas as pd

districts = gpd.read_file("data/districts.shp")
population = pd.read_csv("data/population_by_district.csv")

districts["district_id"] = districts["district_id"].astype(str).str.strip()
population["district_id"] = population["district_id"].astype(str).str.strip()

result = districts.merge(
    population[["district_id", "population_2024"]],
    on="district_id",
    how="left"
)

print(result[["district_id", "population_2024"]].head())
print(type(result))

Example 2: Fix mismatched join key types before merging

This is a common reason a GeoPandas column-based merge appears to fail.

import geopandas as gpd
import pandas as pd

gdf = gpd.read_file("data/parcels.geojson")
df = pd.read_csv("data/parcel_values.csv")

print(gdf["parcel_id"].dtype)  # object
print(df["parcel_id"].dtype)   # int64

df["parcel_id"] = df["parcel_id"].astype(str)
gdf["parcel_id"] = gdf["parcel_id"].astype(str)

joined = gdf.merge(df, on="parcel_id", how="left")

Example 3: Keep all geometries with a left join

Use this pattern when you want to add an attribute table to a GeoDataFrame without dropping features that have no match.

joined = gdf.merge(df, on="site_id", how="left")

unmatched = joined[joined["status"].isna()]
print(f"Unmatched features: {len(unmatched)}")

This is the usual GeoPandas left-join workflow for administrative boundaries, parcels, utility assets, or survey zones.

Example 4: Export the joined GeoDataFrame

After you join the DataFrame to the GeoDataFrame by key, save it to a format that preserves field names and geometry cleanly.

joined.to_file("output/sites_with_status.gpkg", layer="sites", driver="GPKG")
joined.to_file("output/sites_with_status.geojson", driver="GeoJSON")

Explanation

An attribute join matches records by a shared column such as district_id, parcel_id, or code. This is different from a spatial join, which matches features by location, intersection, containment, or nearest distance.

The left table matters. If the left object is a GeoDataFrame, the result keeps the geometry and remains spatial:

joined = gdf.merge(df, on="district_id", how="left")

This is the standard way to merge a pandas DataFrame with a GeoDataFrame while preserving geometry.

In practical GIS terms:

  • left join: keep all spatial features, add attributes where matches exist
  • inner join: keep only features that have a matching record in the attribute table

A duplicate key in either table can create multiple output rows for one or more features. That may be valid in a one-to-many workflow, but it often indicates bad source data.

Edge cases and notes

Join key contains duplicates

If the non-spatial table has repeated IDs, one feature may be duplicated in the output.

df["district_id"].value_counts().loc[lambda s: s > 1]

Duplicates in either table can increase the number of output rows after the merge.

If you expect one record per feature, remove or fix duplicates before the merge.

Join key has different data types

101 and "101" do not match. Convert both columns to the same type before joining.

gdf["district_id"] = gdf["district_id"].astype(str)
df["district_id"] = df["district_id"].astype(str)

Missing matches create null values

In a left join, unmatched features get NaN in the new columns. That is expected behavior. Review those records after the merge.

CRS issues

CRS does not affect a pure attribute join because the match is based on columns, not spatial relationships. Still, check CRS before exporting or using the data in later spatial operations.

print(joined.crs)

If you need to change coordinate systems after the join, see How to Reproject Spatial Data in Python (GeoPandas).

Invalid geometries

Invalid geometries usually do not break an attribute join, but they can cause problems in later GIS processing. If the joined layer will be used for overlays, buffers, or spatial joins, validate geometry after loading.

invalid = ~gdf.is_valid
print(f"Invalid geometries: {invalid.sum()}")

Shapefile field name and type limitations

If your workflow ends with shapefile export, watch for truncated field names. Shapefile is limited compared to GeoPackage. Use GeoPackage when possible.

If you need the conceptual difference between column joins and location-based joins, see Attribute join vs spatial join in GeoPandas.

For the data loading step, see How to Read a Shapefile in Python with GeoPandas.

If your task depends on geometry relationships instead of shared IDs, see How to spatially join two GeoDataFrames in Python.

If your merge runs but produces null values or no matches, see Why a GeoPandas merge returns missing values or no matches.

For exporting the result, see How to Export GeoJSON in Python with GeoPandas.

FAQ

How do I join a CSV to a GeoDataFrame in GeoPandas?

Read the spatial layer with GeoPandas, read the CSV with pandas, then merge on a shared key:

import geopandas as gpd
import pandas as pd

gdf = gpd.read_file("districts.shp")
df = pd.read_csv("population.csv")
joined = gdf.merge(df, on="district_id", how="left")

What is the difference between merge() and a spatial join in GeoPandas?

merge() joins rows by matching column values such as IDs or codes. A spatial join matches features by spatial relationship such as intersects, within, or nearest.

Why does my join return null values for the new columns?

Usually because:

  • the key values do not match exactly
  • one side is string and the other is integer
  • there is whitespace or inconsistent formatting
  • some spatial features have no matching record in the table

How do I keep geometry after merging a pandas DataFrame with a GeoDataFrame?

Keep the GeoDataFrame on the left side:

joined = gdf.merge(df, on="id", how="left")

This preserves geometry and returns a GeoDataFrame.