GeoPandas Merge Returns NaN or No Matches: How to Fix It
Problem statement
A common GIS task is attaching attribute data to a spatial layer: joining population numbers to regions, sales figures to territories, or census statistics to tracts. In GeoPandas, this is usually done with merge(), the same key-based join you use in pandas.
The problem is that merge() can run without an error and still leave you with columns full of NaN, far fewer rows than expected, or a result that has lost its geometry. In practice, this usually happens for one of these reasons:
- the join keys have different dtypes, such as
inton one side andstron the other - the key values differ by whitespace or letter case
- leading-zero codes were turned into integers, so
"06037"became6037 - the wrong column was used in
on,left_on, orright_on - the
howargument hides or drops unmatched rows - the keys simply do not overlap between the two datasets
This guide shows how to diagnose each cause and apply the right fix while keeping the geometry intact.
Quick answer
If a merge() produces NaN values or no matches:
- inspect both key columns with
.head()and.dtypes - make the dtypes match, usually with
.astype(str) - clean the strings with
.str.strip()and a consistent case - restore leading zeros with
.str.zfill()if codes were turned into ints - validate the join with
indicator=Trueand check the_mergecolumn
import pandas as pd
import geopandas as gpd
regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")
# Normalize keys on both sides before joining
regions["code"] = regions["code"].astype(str).str.strip()
stats["code"] = stats["code"].astype(str).str.strip().str.zfill(5)
# Merge into the GeoDataFrame so geometry is preserved
merged = regions.merge(stats, on="code", how="left")
print(merged.head())
Diagnosing a failed merge
Step-by-step solution
Step 1: Inspect both key columns before merging
Before changing anything, look at the actual values and types on both sides.
import pandas as pd
import geopandas as gpd
regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")
print(regions[["code"]].head())
print(stats[["code"]].head())
print("regions dtypes:\n", regions.dtypes)
print("stats dtypes:\n", stats.dtypes)
Look for two things: do the columns have the same dtype, and do the values look identical? A code that prints as 6037 on one side and 06037 on the other will never match, even though they look almost the same.
Step 2: Make the key dtypes match
If one key is an integer and the other is a string, no rows will match. The safest fix is to cast both to string.
regions["code"] = regions["code"].astype(str)
stats["code"] = stats["code"].astype(str)
print(regions["code"].dtype)
print(stats["code"].dtype)
Casting to string avoids subtle problems with float keys and preserves leading zeros once they are restored. Casting both sides the same way is what matters.
Step 3: Strip whitespace and normalize case
Even with matching dtypes, values like "NY " and "NY" or "abc" and "ABC" will not match.
regions["code"] = regions["code"].str.strip()
stats["code"] = stats["code"].str.strip()
# Optional: normalize case if your keys are text labels
regions["code"] = regions["code"].str.upper()
stats["code"] = stats["code"].str.upper()
A quick way to see whether values overlap at all is to compare the two sets:
left_keys = set(regions["code"])
right_keys = set(stats["code"])
print("Only in regions:", list(left_keys - right_keys)[:5])
print("Only in stats:", list(right_keys - left_keys)[:5])
print("Shared keys:", len(left_keys & right_keys))
If Shared keys is 0, the merge cannot match anything and you need to look more closely at the values.
Step 4: Restore leading zeros on code columns
This is the classic cause of a broken merge. When a CSV column like "06037" is read as a number, it becomes 6037. Pad it back to a fixed width with zfill.
# stats came from a CSV and lost the leading zero
stats["code"] = stats["code"].astype(str).str.zfill(5)
print(stats["code"].head())
zfill(5) pads each value to five characters with leading zeros, so 6037 becomes 06037. Use the width that matches your code system, such as 2 for state FIPS or 11 for census tract GEOIDs.
Step 5: Use the correct on, left_on, and right_on
If the key columns have different names, you cannot use on. Use left_on and right_on instead.
# Same column name in both frames
merged = regions.merge(stats, on="code", how="left")
# Different column names
merged = regions.merge(stats, left_on="region_code", right_on="code", how="left")
Passing the wrong column name to on raises a KeyError, but passing a column that exists yet is not the real key produces a silent flood of NaN instead.
Step 6: Choose the right how and validate
The how argument decides which rows survive. To keep every region and see which ones failed to match, use a left join with an indicator.
merged = regions.merge(stats, on="code", how="left", indicator=True)
print(merged["_merge"].value_counts())
# Rows that exist in regions but found no match in stats
unmatched = merged[merged["_merge"] == "left_only"]
print("Unmatched regions:", len(unmatched))
print(unmatched[["code"]].head())
indicator=True adds a _merge column with values both, left_only, and right_only. This turns a silent failure into something you can count and inspect.
Code examples
Example 1: A working attribute merge that keeps geometry
This is the standard pattern. Call merge() on the GeoDataFrame so the result stays a GeoDataFrame.
import pandas as pd
import geopandas as gpd
regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")
regions["code"] = regions["code"].astype(str).str.strip()
stats["code"] = stats["code"].astype(str).str.strip().str.zfill(5)
merged = regions.merge(stats, on="code", how="left")
print(type(merged)) # geopandas.GeoDataFrame
print(merged.crs) # CRS is preserved
print(merged.head())
Example 2: Diagnose a dtype mismatch
Here the codes look identical when printed but one column is an integer.
import pandas as pd
import geopandas as gpd
regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")
print(regions["code"].dtype) # object
print(stats["code"].dtype) # int64
# This merge silently produces NaN because int != str
broken = regions.merge(stats, on="code", how="left")
print(broken["population"].isna().sum()) # all rows NaN
# Fix: cast both to the same type
stats["code"] = stats["code"].astype(str).str.zfill(5)
fixed = regions.merge(stats, on="code", how="left")
print(fixed["population"].isna().sum()) # 0
Example 3: Validate matches with an indicator
Use pd.merge with indicator=True to see exactly where rows came from.
import pandas as pd
import geopandas as gpd
regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")
regions["code"] = regions["code"].astype(str).str.strip()
stats["code"] = stats["code"].astype(str).str.strip()
merged = pd.merge(regions, stats, on="code", how="outer", indicator=True)
print(merged["_merge"].value_counts())
# Inspect the keys that did not line up
print(merged.loc[merged["_merge"] != "both", "code"].unique())
Example 4: Different column names with left_on and right_on
import pandas as pd
import geopandas as gpd
regions = gpd.read_file("data/regions.gpkg") # key column: region_id
stats = pd.read_csv("data/stats.csv") # key column: id
regions["region_id"] = regions["region_id"].astype(str).str.strip()
stats["id"] = stats["id"].astype(str).str.strip()
merged = regions.merge(
stats,
left_on="region_id",
right_on="id",
how="left",
)
print(merged.head())
Explanation
A merge() is a key-based join. pandas aligns rows by matching values in the key columns, and a match requires the values to be equal in both type and content. 6037 (an integer) and "06037" (a string) are not equal, so they do not match. Neither do "NY " and "NY". When no rows match, pandas still produces output: with a left join, every left row is kept and the columns from the right frame are filled with NaN. That is why a broken merge looks like a successful one with empty columns.
The geometry question is separate but important. merge() returns the type of the object it is called on. If you call regions.merge(stats, ...) where regions is a GeoDataFrame, the result is a GeoDataFrame and the geometry column and CRS are preserved. If you call stats.merge(regions, ...) where stats is a plain DataFrame, the result is a plain DataFrame and you lose the GeoDataFrame behavior even though a geometry column is present. The rule is simple: keep the GeoDataFrame on the left and call merge() on it.
Because both failures are silent, the fix is always the same shape: inspect the keys, normalize them on both sides, then validate the result with an indicator before trusting it.
Edge cases or notes
Leading zeros in codes
Postal codes, FIPS codes, and many ID systems use leading zeros. CSV and Excel readers often strip them by turning the column into an integer. Cast the column to string and pad it with .str.zfill(width) to restore them. Do this on whichever side lost the zeros, and make sure both sides use the same width.
Float keys
If a key column was read as a float, values can appear as 6037.0. Comparing 6037.0 to "6037" will never match. Convert to integer first if appropriate, then to string: col.astype("int64").astype(str). Watch for NaN values, which block a direct cast to int.
Many-to-many duplicates
If the right frame has duplicate keys, the merge multiplies rows: each left row matches every duplicate, so your row count grows. Check for duplicates with stats["code"].duplicated().sum() before merging, and use validate="one_to_many" or validate="one_to_one" to make pandas raise an error instead of silently expanding the result.
Hidden whitespace and invisible characters
.str.strip() removes leading and trailing spaces, but not interior characters such as non-breaking spaces or a trailing tab. If keys still fail to match after stripping, print repr() of a few values to reveal hidden characters.
Keys that genuinely do not overlap
Sometimes the data simply does not share keys, for example a county dataset merged against a state dataset. No amount of cleaning will help. Compare the two key sets directly and confirm there is real overlap before debugging dtypes.
Internal links
For the standard attribute join workflow, see:
- How to Join Attribute Data to a GeoDataFrame in Python
- What Is a GeoDataFrame? Structure and Key Concepts
If you need a location-based join instead of a key-based one, see:
- How to Perform a Spatial Join in Python (GeoPandas)
- Spatial Join Returns Empty Results in GeoPandas: How to Fix It
To summarize data after joining, see How to Aggregate Spatial Data by Region in GeoPandas.
FAQ
Why does my GeoPandas merge return all NaN values?
The join keys do not match. The most common reasons are a dtype mismatch between an integer and a string, whitespace or case differences in the values, or leading zeros that were stripped when a code column was read as a number. Inspect both key columns and normalize them on each side before merging.
How do I keep the geometry after a merge?
Call merge() on the GeoDataFrame, not on the plain DataFrame. regions.merge(stats, ...) keeps the geometry and CRS, while stats.merge(regions, ...) returns a plain DataFrame and loses the GeoDataFrame behavior.
How can I check which rows failed to match?
Add indicator=True to the merge and inspect the resulting _merge column. Values of left_only are rows from the left frame with no match, and right_only are unmatched rows from the right frame.
Why did my row count grow after merging?
The right frame has duplicate keys, so each left row matches multiple right rows. Check with df["code"].duplicated().sum() and pass validate="one_to_many" or validate="one_to_one" to make pandas raise an error on unexpected duplicates.
Should I use merge() or join() to attach attributes to a GeoDataFrame?
Use merge() when you match on a shared key column, which is the common case for attaching tabular data by code. Use join() only when you are aligning on the DataFrame index, which is rarer for spatial attribute joins.
How do I merge on multiple key columns at once?
Pass a list to on, for example regions.merge(stats, on=["state", "county"], how="left"). Every column in the list must match in dtype and content on both sides, so normalize each one before merging.
Why does merge() add _x and _y suffixes to my columns?
Both frames contain a non-key column with the same name, so pandas disambiguates them with _x (left) and _y (right) suffixes. Drop or rename the duplicate before merging, or set custom labels with suffixes=("_region", "_stats").
My keys are numeric IDs with no leading zeros, so why are they still NaN?
A common cause is that one side was read as a float (6037.0) and the other as an integer or string. Comparing 6037.0 to "6037" never matches, so cast cleanly first with col.astype("int64").astype(str), watching for NaN values that block the integer cast.