GeoPandas Merge Returns NaN or No Matches: How to Fix It

Problem statement

A common GIS task is attaching attribute data to a spatial layer: joining population numbers to regions, sales figures to territories, or census statistics to tracts. In GeoPandas, this is usually done with merge(), the same key-based join you use in pandas.

The problem is that merge() can run without an error and still leave you with columns full of NaN, far fewer rows than expected, or a result that has lost its geometry. In practice, this usually happens for one of these reasons:

the join keys have different dtypes, such as int on one side and str on the other
the key values differ by whitespace or letter case
leading-zero codes were turned into integers, so "06037" became 6037
the wrong column was used in on, left_on, or right_on
the how argument hides or drops unmatched rows
the keys simply do not overlap between the two datasets

This guide shows how to diagnose each cause and apply the right fix while keeping the geometry intact.

Quick answer

If a merge() produces NaN values or no matches:

inspect both key columns with .head() and .dtypes
make the dtypes match, usually with .astype(str)
clean the strings with .str.strip() and a consistent case
restore leading zeros with .str.zfill() if codes were turned into ints
validate the join with indicator=True and check the _merge column

import pandas as pd
import geopandas as gpd

regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")

# Normalize keys on both sides before joining
regions["code"] = regions["code"].astype(str).str.strip()
stats["code"] = stats["code"].astype(str).str.strip().str.zfill(5)

# Merge into the GeoDataFrame so geometry is preserved
merged = regions.merge(stats, on="code", how="left")
print(merged.head())

Diagnosing a failed merge

Flowchart — Diagnosing a merge that returns NaN or no matches. — Diagnosing a merge that returns NaN or no matches.

Step-by-step solution

Five steps to repair a join key before merging: compare, align dtypes, trim, pad, merge. — Four of these five steps happen before the merge call — that is where the bug lives.

Step 1: Inspect both key columns before merging

Before changing anything, look at the actual values and types on both sides.

import pandas as pd
import geopandas as gpd

regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")

print(regions[["code"]].head())
print(stats[["code"]].head())

print("regions dtypes:\n", regions.dtypes)
print("stats dtypes:\n", stats.dtypes)

Look for two things: do the columns have the same dtype, and do the values look identical? A code that prints as 6037 on one side and 06037 on the other will never match, even though they look almost the same.

Step 2: Make the key dtypes match

If one key is an integer and the other is a string, no rows will match. The safest fix is to cast both to string.

regions["code"] = regions["code"].astype(str)
stats["code"] = stats["code"].astype(str)

print(regions["code"].dtype)
print(stats["code"].dtype)

Casting to string avoids subtle problems with float keys and preserves leading zeros once they are restored. Casting both sides the same way is what matters.

Step 3: Strip whitespace and normalize case

Even with matching dtypes, values like "NY " and "NY" or "abc" and "ABC" will not match.

regions["code"] = regions["code"].str.strip()
stats["code"] = stats["code"].str.strip()

# Optional: normalize case if your keys are text labels
regions["code"] = regions["code"].str.upper()
stats["code"] = stats["code"].str.upper()

A quick way to see whether values overlap at all is to compare the two sets:

left_keys = set(regions["code"])
right_keys = set(stats["code"])

print("Only in regions:", list(left_keys - right_keys)[:5])
print("Only in stats:", list(right_keys - left_keys)[:5])
print("Shared keys:", len(left_keys & right_keys))

If Shared keys is 0, the merge cannot match anything and you need to look more closely at the values.

Step 4: Restore leading zeros on code columns

This is the classic cause of a broken merge. When a CSV column like "06037" is read as a number, it becomes 6037. Pad it back to a fixed width with zfill.

# stats came from a CSV and lost the leading zero
stats["code"] = stats["code"].astype(str).str.zfill(5)

print(stats["code"].head())

zfill(5) pads each value to five characters with leading zeros, so 6037 becomes 06037. Use the width that matches your code system, such as 2 for state FIPS or 11 for census tract GEOIDs.

Step 5: Use the correct `on`, `left_on`, and `right_on`

If the key columns have different names, you cannot use on. Use left_on and right_on instead.

# Same column name in both frames
merged = regions.merge(stats, on="code", how="left")

# Different column names
merged = regions.merge(stats, left_on="region_code", right_on="code", how="left")

Passing the wrong column name to on raises a KeyError, but passing a column that exists yet is not the real key produces a silent flood of NaN instead.

Step 6: Choose the right `how` and validate

The how argument decides which rows survive. To keep every region and see which ones failed to match, use a left join with an indicator.

merged = regions.merge(stats, on="code", how="left", indicator=True)

print(merged["_merge"].value_counts())

# Rows that exist in regions but found no match in stats
unmatched = merged[merged["_merge"] == "left_only"]
print("Unmatched regions:", len(unmatched))
print(unmatched[["code"]].head())

indicator=True adds a _merge column with values both, left_only, and right_only. This turns a silent failure into something you can count and inspect.

Code examples

Example 1: A working attribute merge that keeps geometry

This is the standard pattern. Call merge() on the GeoDataFrame so the result stays a GeoDataFrame.

import pandas as pd
import geopandas as gpd

regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")

regions["code"] = regions["code"].astype(str).str.strip()
stats["code"] = stats["code"].astype(str).str.strip().str.zfill(5)

merged = regions.merge(stats, on="code", how="left")

print(type(merged))          # geopandas.GeoDataFrame
print(merged.crs)            # CRS is preserved
print(merged.head())

Example 2: Diagnose a dtype mismatch

Here the codes look identical when printed but one column is an integer.

import pandas as pd
import geopandas as gpd

regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")

print(regions["code"].dtype)   # object
print(stats["code"].dtype)     # int64

# This merge silently produces NaN because int != str
broken = regions.merge(stats, on="code", how="left")
print(broken["population"].isna().sum())   # all rows NaN

# Fix: cast both to the same type
stats["code"] = stats["code"].astype(str).str.zfill(5)
fixed = regions.merge(stats, on="code", how="left")
print(fixed["population"].isna().sum())     # 0

Example 3: Validate matches with an indicator

Use pd.merge with indicator=True to see exactly where rows came from.

import pandas as pd
import geopandas as gpd

regions = gpd.read_file("data/regions.gpkg")
stats = pd.read_csv("data/stats.csv")

regions["code"] = regions["code"].astype(str).str.strip()
stats["code"] = stats["code"].astype(str).str.strip()

merged = pd.merge(regions, stats, on="code", how="outer", indicator=True)
print(merged["_merge"].value_counts())

# Inspect the keys that did not line up
print(merged.loc[merged["_merge"] != "both", "code"].unique())

Example 4: Different column names with `left_on` and `right_on`

import pandas as pd
import geopandas as gpd

regions = gpd.read_file("data/regions.gpkg")   # key column: region_id
stats = pd.read_csv("data/stats.csv")          # key column: id

regions["region_id"] = regions["region_id"].astype(str).str.strip()
stats["id"] = stats["id"].astype(str).str.strip()

merged = regions.merge(
    stats,
    left_on="region_id",
    right_on="id",
    how="left",
)

print(merged.head())

Explanation

A merge() is a key-based join. pandas aligns rows by matching values in the key columns, and a match requires the values to be equal in both type and content. 6037 (an integer) and "06037" (a string) are not equal, so they do not match. Neither do "NY " and "NY". When no rows match, pandas still produces output: with a left join, every left row is kept and the columns from the right frame are filled with NaN. That is why a broken merge looks like a successful one with empty columns.

A table of four key mismatches: dtype, trailing space, case, and lost leading zeros. — Screen output hides all four — which is why an empty merge is so puzzling at first.

The geometry question is separate but important. merge() returns the type of the object it is called on. If you call regions.merge(stats, ...) where regions is a GeoDataFrame, the result is a GeoDataFrame and the geometry column and CRS are preserved. If you call stats.merge(regions, ...) where stats is a plain DataFrame, the result is a plain DataFrame and you lose the GeoDataFrame behavior even though a geometry column is present. The rule is simple: keep the GeoDataFrame on the left and call merge() on it.

Because both failures are silent, the fix is always the same shape: inspect the keys, normalize them on both sides, then validate the result with an indicator before trusting it.

Edge cases or notes

Leading zeros in codes

Postal codes, FIPS codes, and many ID systems use leading zeros. CSV and Excel readers often strip them by turning the column into an integer. Cast the column to string and pad it with .str.zfill(width) to restore them. Do this on whichever side lost the zeros, and make sure both sides use the same width.

Float keys

If a key column was read as a float, values can appear as 6037.0. Comparing 6037.0 to "6037" will never match. Convert to integer first if appropriate, then to string: col.astype("int64").astype(str). Watch for NaN values, which block a direct cast to int.

Many-to-many duplicates

If the right frame has duplicate keys, the merge multiplies rows: each left row matches every duplicate, so your row count grows. Check for duplicates with stats["code"].duplicated().sum() before merging, and use validate="one_to_many" or validate="one_to_one" to make pandas raise an error instead of silently expanding the result.

Hidden whitespace and invisible characters

.str.strip() removes leading and trailing spaces, but not interior characters such as non-breaking spaces or a trailing tab. If keys still fail to match after stripping, print repr() of a few values to reveal hidden characters.

Keys that genuinely do not overlap

Sometimes the data simply does not share keys, for example a county dataset merged against a state dataset. No amount of cleaning will help. Compare the two key sets directly and confirm there is real overlap before debugging dtypes.

Internal links

For the standard attribute join workflow, see:

If you need a location-based join instead of a key-based one, see:

To summarize data after joining, see How to Aggregate Spatial Data by Region in GeoPandas.

FAQ

Why does my GeoPandas merge return all NaN values?

The join keys do not match. The most common reasons are a dtype mismatch between an integer and a string, whitespace or case differences in the values, or leading zeros that were stripped when a code column was read as a number. Inspect both key columns and normalize them on each side before merging.

How do I keep the geometry after a merge?

Call merge() on the GeoDataFrame, not on the plain DataFrame. regions.merge(stats, ...) keeps the geometry and CRS, while stats.merge(regions, ...) returns a plain DataFrame and loses the GeoDataFrame behavior.

How can I check which rows failed to match?

Add indicator=True to the merge and inspect the resulting _merge column. Values of left_only are rows from the left frame with no match, and right_only are unmatched rows from the right frame.

Why did my row count grow after merging?

The right frame has duplicate keys, so each left row matches multiple right rows. Check with df["code"].duplicated().sum() and pass validate="one_to_many" or validate="one_to_one" to make pandas raise an error on unexpected duplicates.

Should I use `merge()` or `join()` to attach attributes to a GeoDataFrame?

Use merge() when you match on a shared key column, which is the common case for attaching tabular data by code. Use join() only when you are aligning on the DataFrame index, which is rarer for spatial attribute joins.

How do I merge on multiple key columns at once?

Pass a list to on, for example regions.merge(stats, on=["state", "county"], how="left"). Every column in the list must match in dtype and content on both sides, so normalize each one before merging.

Why does `merge()` add `_x` and `_y` suffixes to my columns?

Both frames contain a non-key column with the same name, so pandas disambiguates them with _x (left) and _y (right) suffixes. Drop or rename the duplicate before merging, or set custom labels with suffixes=("_region", "_stats").

My keys are numeric IDs with no leading zeros, so why are they still NaN?

A common cause is that one side was read as a float (6037.0) and the other as an integer or string. Comparing 6037.0 to "6037" never matches, so cast cleanly first with col.astype("int64").astype(str), watching for NaN values that block the integer cast.

GeoPandas Merge Returns NaN or No Matches: How to Fix It #

Problem statement #

Quick answer #

Diagnosing a failed merge #

Step-by-step solution #

Step 1: Inspect both key columns before merging #

Step 2: Make the key dtypes match #

Step 3: Strip whitespace and normalize case #

Step 4: Restore leading zeros on code columns #

Step 5: Use the correct on, left_on, and right_on #

Step 6: Choose the right how and validate #

Code examples #

Example 1: A working attribute merge that keeps geometry #

Example 2: Diagnose a dtype mismatch #

Example 3: Validate matches with an indicator #

Example 4: Different column names with left_on and right_on #

Explanation #

Edge cases or notes #

Leading zeros in codes #

Float keys #

Many-to-many duplicates #

Hidden whitespace and invisible characters #

Keys that genuinely do not overlap #

Internal links #

FAQ #

Why does my GeoPandas merge return all NaN values? #

How do I keep the geometry after a merge? #

How can I check which rows failed to match? #

Why did my row count grow after merging? #

Should I use merge() or join() to attach attributes to a GeoDataFrame? #

How do I merge on multiple key columns at once? #

Why does merge() add _x and _y suffixes to my columns? #

My keys are numeric IDs with no leading zeros, so why are they still NaN? #