How to Join Attribute Data to a GeoDataFrame in Python

Problem statement

A common GIS task is attaching non-spatial attribute data to existing spatial features. For example, you may have:

a shapefile or GeoJSON of district boundaries loaded into a GeoDataFrame
a CSV file with population, land use, or survey results
a shared ID field such as district_id

In GeoPandas, this is an attribute join, not a spatial join. You are matching rows by a common column, not by location or geometry overlap.

This page shows the typical GeoPandas workflow for an attribute join: reading spatial data, loading a CSV, checking the join key, merging the tables, verifying the result, and exporting the joined layer.

Quick answer

Use GeoDataFrame.merge() on a shared key column.

The safest pattern is to keep the GeoDataFrame on the left side of the join:

joined = gdf.merge(df, on="district_id", how="left")

This preserves the geometry column and returns a GeoDataFrame. In most GIS workflows:

use how="left" to keep all spatial features
use how="inner" to keep only matched features

Diagnosing a join that returns null columns

Flowchart — Diagnosing an attribute join that returns null columns. — Diagnosing an attribute join that returns null columns.

Step-by-step solution

Load the spatial dataset into a GeoDataFrame

Read your shapefile or GeoJSON with GeoPandas and inspect the key field.

import geopandas as gpd

gdf = gpd.read_file("data/district_boundaries.shp")

print(gdf.head())
print(gdf.columns)
print(gdf.geometry.name)
print(len(gdf))

Check that:

the expected join column exists, such as district_id
the geometry column is present
the dataset loaded as a GeoDataFrame

print(type(gdf))
print(gdf[["district_id", "geometry"]].head())

Load the attribute table

Read the non-spatial table with pandas.

import pandas as pd

df = pd.read_csv("data/district_population.csv")

print(df.head())
print(df.columns)
print(len(df))

For a typical GIS attribute join, keep only the columns you need:

df = df[["district_id", "population_2024", "households"]]

This helps avoid unnecessary duplicate fields in the output.

Check that the join key matches in both tables

Before joining, verify that the key field has the same name and compatible data type in both tables.

print(gdf["district_id"].dtype)
print(df["district_id"].dtype)

A common problem is one table storing IDs as integers and the other as strings. Another common issue is extra whitespace.

Clean both sides if needed:

gdf["district_id"] = gdf["district_id"].astype(str).str.strip()
df["district_id"] = df["district_id"].astype(str).str.strip()

Also check for duplicates in the attribute table:

duplicate_keys = df[df["district_id"].duplicated(keep=False)]
print(duplicate_keys)

If the CSV should contain one row per district, duplicates need to be fixed before merging.

Join the attribute table to the GeoDataFrame

Use a column-based merge. A left join is usually the right default for GIS because it keeps all spatial features.

joined = gdf.merge(df, on="district_id", how="left")

If you only want features with matching attribute records, use an inner join:

matched_only = gdf.merge(df, on="district_id", how="inner")

To join on columns with different names:

joined = gdf.merge(
    df,
    left_on="district_id",
    right_on="DIST_ID",
    how="left"
)

Verify the result

Check the output after the merge.

print(type(joined))
print(len(joined))
print(joined.columns)
print(joined[["district_id", "population_2024", "geometry"]].head())

You want to confirm:

the expected attribute columns were added
the geometry column still exists
the row count makes sense

You can also check how many spatial features did not get a match:

missing = joined["population_2024"].isna().sum()
print(f"Rows with no matching population record: {missing}")

Save the joined output

Export the result to a GIS format.

joined.to_file("output/district_population.geojson", driver="GeoJSON")

For GeoPackage:

joined.to_file("output/district_population.gpkg", layer="districts", driver="GPKG")

For shapefile:

joined.to_file("output/district_population.shp")

If possible, prefer GeoPackage over shapefile because shapefiles have stricter field name and data type limits.

Code examples

Example 1: Join a CSV of population data to polygon boundaries

This example shows a common pattern for joining a CSV to a GeoDataFrame in Python.

import geopandas as gpd
import pandas as pd

districts = gpd.read_file("data/districts.shp")
population = pd.read_csv("data/population_by_district.csv")

districts["district_id"] = districts["district_id"].astype(str).str.strip()
population["district_id"] = population["district_id"].astype(str).str.strip()

result = districts.merge(
    population[["district_id", "population_2024"]],
    on="district_id",
    how="left"
)

print(result[["district_id", "population_2024"]].head())
print(type(result))

Example 2: Fix mismatched join key types before merging

This is a common reason a GeoPandas column-based merge appears to fail.

import geopandas as gpd
import pandas as pd

gdf = gpd.read_file("data/parcels.geojson")
df = pd.read_csv("data/parcel_values.csv")

print(gdf["parcel_id"].dtype)  # object
print(df["parcel_id"].dtype)   # int64

df["parcel_id"] = df["parcel_id"].astype(str)
gdf["parcel_id"] = gdf["parcel_id"].astype(str)

joined = gdf.merge(df, on="parcel_id", how="left")

Example 3: Keep all geometries with a left join

Use this pattern when you want to add an attribute table to a GeoDataFrame without dropping features that have no match.

joined = gdf.merge(df, on="site_id", how="left")

unmatched = joined[joined["status"].isna()]
print(f"Unmatched features: {len(unmatched)}")

This is the usual GeoPandas left-join workflow for administrative boundaries, parcels, utility assets, or survey zones.

Example 4: Export the joined GeoDataFrame

After you join the DataFrame to the GeoDataFrame by key, save it to a format that preserves field names and geometry cleanly.

joined.to_file("output/sites_with_status.gpkg", layer="sites", driver="GPKG")
joined.to_file("output/sites_with_status.geojson", driver="GeoJSON")

Explanation

An attribute join matches records by a shared column such as district_id, parcel_id, or code. This is different from a spatial join, which matches features by location, intersection, containment, or nearest distance.

The left table matters. If the left object is a GeoDataFrame, the result keeps the geometry and remains spatial:

joined = gdf.merge(df, on="district_id", how="left")

This is the standard way to merge a pandas DataFrame with a GeoDataFrame while preserving geometry.

In practical GIS terms:

left join: keep all spatial features, add attributes where matches exist
inner join: keep only features that have a matching record in the attribute table

A duplicate key in either table can create multiple output rows for one or more features. That may be valid in a one-to-many workflow, but it often indicates bad source data.

Edge cases and notes

Join key contains duplicates

If the non-spatial table has repeated IDs, one feature may be duplicated in the output.

df["district_id"].value_counts().loc[lambda s: s > 1]

Duplicates in either table can increase the number of output rows after the merge.

If you expect one record per feature, remove or fix duplicates before the merge.

Join key has different data types

101 and "101" do not match. Convert both columns to the same type before joining.

gdf["district_id"] = gdf["district_id"].astype(str)
df["district_id"] = df["district_id"].astype(str)

Missing matches create null values

In a left join, unmatched features get NaN in the new columns. That is expected behavior. Review those records after the merge.

CRS issues

CRS does not affect a pure attribute join because the match is based on columns, not spatial relationships. Still, check CRS before exporting or using the data in later spatial operations.

print(joined.crs)

If you need to change coordinate systems after the join, see How to Reproject Spatial Data in Python (GeoPandas).

Invalid geometries

Invalid geometries usually do not break an attribute join, but they can cause problems in later GIS processing. If the joined layer will be used for overlays, buffers, or spatial joins, validate geometry after loading.

invalid = ~gdf.is_valid
print(f"Invalid geometries: {invalid.sum()}")

Shapefile field name and type limitations

If your workflow ends with shapefile export, watch for truncated field names. Shapefile is limited compared to GeoPackage. Use GeoPackage when possible.

Internal links

If you need the conceptual difference between column joins and location-based joins, see Attribute join vs spatial join in GeoPandas.

For the data loading step, see How to Read a Shapefile in Python with GeoPandas.

If your task depends on geometry relationships instead of shared IDs, see How to spatially join two GeoDataFrames in Python.

If your merge runs but produces null values or no matches, see Why a GeoPandas merge returns missing values or no matches.

For exporting the result, see How to Export GeoJSON in Python with GeoPandas.

FAQ

How do I join a CSV to a GeoDataFrame in GeoPandas?

Read the spatial layer with GeoPandas, read the CSV with pandas, then merge on a shared key:

import geopandas as gpd
import pandas as pd

gdf = gpd.read_file("districts.shp")
df = pd.read_csv("population.csv")
joined = gdf.merge(df, on="district_id", how="left")

What is the difference between `merge()` and a spatial join in GeoPandas?

merge() joins rows by matching column values such as IDs or codes. A spatial join matches features by spatial relationship such as intersects, within, or nearest.

Why does my join return null values for the new columns?

Usually because:

the key values do not match exactly
one side is string and the other is integer
there is whitespace or inconsistent formatting
some spatial features have no matching record in the table

How do I keep geometry after merging a pandas DataFrame with a GeoDataFrame?

Keep the GeoDataFrame on the left side:

joined = gdf.merge(df, on="id", how="left")

This preserves geometry and returns a GeoDataFrame.

How do I join two GeoDataFrames on an index instead of a column?

Use join() for an index-based match, or reset the index and use merge() on the resulting column. merge() with on=, left_on=/right_on= is the more explicit and common choice for attribute joins.

Why do I get `_x` and `_y` suffixes on columns after merging?

Both tables contain a column with the same name. Either drop the duplicate before merging, or control the labels with the suffixes parameter, for example gdf.merge(df, on="id", suffixes=("", "_census")).

How do I join one attribute row to many spatial features (one-to-many)?

Keep the GeoDataFrame with the repeated key on the left and the single-row table on the right: gdf.merge(lookup, on="region_code", how="left"). Each matching feature receives a copy of the lookup row, which is expected behavior, not duplication of source data.

Can I merge on more than one key column?

Yes. Pass a list to on, for example gdf.merge(df, on=["state", "county"], how="left"). Both tables must contain all listed columns with matching values and compatible types.

How to Join Attribute Data to a GeoDataFrame in Python #

Problem statement #

Quick answer #

Diagnosing a join that returns null columns #

Step-by-step solution #

Load the spatial dataset into a GeoDataFrame #

Load the attribute table #

Check that the join key matches in both tables #

Join the attribute table to the GeoDataFrame #

Verify the result #

Save the joined output #

Code examples #

Example 1: Join a CSV of population data to polygon boundaries #

Example 2: Fix mismatched join key types before merging #

Example 3: Keep all geometries with a left join #

Example 4: Export the joined GeoDataFrame #

Explanation #

Edge cases and notes #

Join key contains duplicates #

Join key has different data types #

Missing matches create null values #

CRS issues #

Invalid geometries #

Shapefile field name and type limitations #

Internal links #

FAQ #

How do I join a CSV to a GeoDataFrame in GeoPandas? #

What is the difference between merge() and a spatial join in GeoPandas? #

Why does my join return null values for the new columns? #

How do I keep geometry after merging a pandas DataFrame with a GeoDataFrame? #

How do I join two GeoDataFrames on an index instead of a column? #

Why do I get _x and _y suffixes on columns after merging? #

How do I join one attribute row to many spatial features (one-to-many)? #

Can I merge on more than one key column? #