How to Batch Process Multiple Shapefiles in Python

Problem statement

A common GIS task is applying the same operation to many shapefiles in a folder. Doing this manually in QGIS or another desktop tool is slow and easy to get wrong.

Typical examples include:

  • reprojecting many shapefiles to one CRS
  • cleaning or standardizing attribute fields
  • adding calculated columns
  • filtering features based on an attribute
  • exporting processed copies to a new folder

If you need to batch process multiple shapefiles in Python, the practical pattern is simple: find all .shp files, loop through them, run the same GeoPandas logic on each file, and save the results to a separate output folder.

Quick answer

The safest way to batch process shapefiles in Python is:

  1. define an input folder and output folder
  2. list all shapefiles in the input folder
  3. read each file with GeoPandas
  4. pass the GeoDataFrame into a reusable processing function
  5. save the result with a consistent filename
  6. add basic error handling so one bad file does not stop the whole batch
from pathlib import Path
import geopandas as gpd

input_folder = Path("data/input_shapefiles")
output_folder = Path("data/output_shapefiles")
output_folder.mkdir(parents=True, exist_ok=True)

def process_gdf(gdf):
    gdf = gdf.copy()
    gdf["source"] = "batch_run"
    return gdf

for shp_path in input_folder.glob("*.shp"):
    gdf = gpd.read_file(shp_path)
    result = process_gdf(gdf)
    output_path = output_folder / f"{shp_path.stem}_processed.shp"
    result.to_file(output_path)

This pattern works for reprojection, attribute updates, filtering, clipping, and other folder-based GIS tasks.

Step-by-step solution

Set up input and output folders

Use separate folders for source data and processed outputs. This reduces the risk of overwriting your original shapefiles.

from pathlib import Path

input_folder = Path("data/input_shapefiles")
output_folder = Path("data/output_shapefiles")
output_folder.mkdir(parents=True, exist_ok=True)

Writing to a new folder is safer than editing files in place, especially during testing.

Get all shapefiles from the folder

Use pathlib to find only .shp files in the input folder. If you also need subfolders, use rglob("*.shp") instead of glob("*.shp").

shapefiles = list(input_folder.glob("*.shp"))

print(f"Found {len(shapefiles)} shapefiles")
for path in shapefiles:
    print(path.name)

This is useful in mixed folders that may also contain .dbf, .shx, .prj, CSV files, or other data.

A shapefile is not just one file. The .shp file must stay with its related sidecar files such as .dbf, .shx, and often .prj.

Read each shapefile with GeoPandas

Loop through the file paths and load each one into a GeoDataFrame.

import geopandas as gpd

for shp_path in shapefiles:
    gdf = gpd.read_file(shp_path)
    print(shp_path.name, len(gdf), "features")

At this point, gdf is a standard GeoDataFrame, so you can apply any GeoPandas operation you would normally run on a single shapefile.

Apply the same processing logic to each file

The cleanest pattern is to put the repeated GIS logic in a function.

Here is a simple example that adds a new field and reprojects the data to EPSG:4326.

def process_gdf(gdf):
    gdf = gdf.copy()

    if gdf.crs is None:
        raise ValueError("Input shapefile has no CRS defined")

    gdf = gdf.to_crs("EPSG:4326")
    gdf["processed"] = "yes"

    return gdf

This keeps the loop simple and makes it easier to change the processing step later.

For example, if you receive district boundary shapefiles from multiple offices and need them all in WGS84 with one consistent field added, you can use the same function for every file in the folder.

The same structure also works for:

  • selecting records
  • renaming columns
  • calculating areas or lengths
  • clipping to a boundary
  • standardizing schemas

Save processed shapefiles to a new folder

Build output names consistently so you can trace results back to the original source file.

for shp_path in shapefiles:
    gdf = gpd.read_file(shp_path)
    result = process_gdf(gdf)

    output_path = output_folder / f"{shp_path.stem}_processed.shp"
    result.to_file(output_path)

Using a suffix like _processed avoids name collisions and makes output files easy to identify.

If you write back to shapefile format, remember that shapefiles have limitations such as short field names and weaker support for modern data types.

Add basic progress messages and error handling

For real folders, add status messages and catch errors per file.

failed_files = []

for shp_path in shapefiles:
    print(f"Processing: {shp_path.name}")

    try:
        gdf = gpd.read_file(shp_path)
        result = process_gdf(gdf)

        output_path = output_folder / f"{shp_path.stem}_processed.shp"
        result.to_file(output_path)

        print(f"Saved: {output_path.name}")

    except Exception as e:
        print(f"Failed: {shp_path.name} -> {e}")
        failed_files.append((shp_path.name, str(e)))

print("\nBatch complete")

if failed_files:
    print("Failed files:")
    for name, error in failed_files:
        print(f"- {name}: {error}")

This is a reasonable minimum for a simple batch workflow, though larger jobs usually benefit from logging and validation checks.

Code examples

Example 1: Batch read and write shapefiles from one folder

This example shows the basic read-and-write loop with no extra GIS transformation.

from pathlib import Path
import geopandas as gpd

input_folder = Path("data/input_shapefiles")
output_folder = Path("data/written_shapefiles")
output_folder.mkdir(parents=True, exist_ok=True)

for shp_path in input_folder.glob("*.shp"):
    gdf = gpd.read_file(shp_path)
    output_path = output_folder / shp_path.name
    gdf.to_file(output_path)

Example 2: Batch reproject shapefiles to a target CRS

A realistic task is converting many files to one CRS before later analysis.

from pathlib import Path
import geopandas as gpd

input_folder = Path("data/input_shapefiles")
output_folder = Path("data/reprojected_shapefiles")
output_folder.mkdir(parents=True, exist_ok=True)

target_crs = "EPSG:4326"

for shp_path in input_folder.glob("*.shp"):
    gdf = gpd.read_file(shp_path)

    if gdf.crs is None:
        print(f"Skipping {shp_path.name}: missing CRS")
        continue

    gdf = gdf.to_crs(target_crs)
    output_path = output_folder / f"{shp_path.stem}_wgs84.shp"
    gdf.to_file(output_path)

Example 3: Batch add or update an attribute field

This is useful for standardizing datasets before merging or reporting.

from pathlib import Path
import geopandas as gpd

input_folder = Path("data/input_shapefiles")
output_folder = Path("data/standardized_shapefiles")
output_folder.mkdir(parents=True, exist_ok=True)

for shp_path in input_folder.glob("*.shp"):
    gdf = gpd.read_file(shp_path)

    gdf["src_file"] = shp_path.stem

    if "status" in gdf.columns:
        gdf["status"] = gdf["status"].astype(str).str.upper()

    output_path = output_folder / f"{shp_path.stem}_updated.shp"
    gdf.to_file(output_path)

Example 4: Batch process with try/except and progress reporting

This version is safer for larger folders.

from pathlib import Path
import geopandas as gpd

input_folder = Path("data/input_shapefiles")
output_folder = Path("data/output_shapefiles")
output_folder.mkdir(parents=True, exist_ok=True)

failed = []

def process_gdf(gdf, file_name):
    gdf = gdf.copy()

    if gdf.crs is None:
        raise ValueError("Missing CRS")

    gdf = gdf.to_crs("EPSG:3857")
    gdf["src_name"] = file_name
    return gdf

for shp_path in input_folder.glob("*.shp"):
    print(f"Processing {shp_path.name}...")

    try:
        gdf = gpd.read_file(shp_path)
        result = process_gdf(gdf, shp_path.stem)

        output_path = output_folder / f"{shp_path.stem}_processed.shp"
        result.to_file(output_path)

        print(f"Saved {output_path.name}")

    except Exception as e:
        failed.append((shp_path.name, str(e)))
        print(f"Error in {shp_path.name}: {e}")

print(f"\nDone. Failed files: {len(failed)}")

Explanation

This approach works because most GIS batch jobs are just file loops plus one reusable transformation.

The core pieces are:

  • a folder of input shapefiles
  • a loop over file paths
  • a processing function that handles one GeoDataFrame
  • an output folder for clean results

That makes the workflow easy to test. First, make sure the function works on one shapefile. Then run it across the whole folder.

GeoPandas is usually enough when you need to process a moderate number of vector files with standard operations. For larger pipelines, you may also want:

  • structured logging instead of print
  • validation checks before saving
  • GeoPackage output instead of shapefile
  • batch processing across subfolders with rglob("*.shp")

If you plan to merge outputs later, consistent CRS and field names matter more than the loop itself.

Edge cases or notes

Some files are missing CRS information

If a file has no CRS, reprojection with to_crs() is not reliable and may fail. You need to know the correct original CRS and assign it first with set_crs().

Mixed geometry types across shapefiles

A batch folder may contain points, lines, and polygons. Some workflows work for all geometry types, but others do not. For example, polygon area calculations do not make sense for point layers.

Field name length limits in shapefiles

Shapefiles have a 10-character field name limit. Longer column names may be truncated when you save output.

Invalid geometries

Spatial operations can fail on broken geometries. If clipping, overlay, or buffering fails, check whether some features are invalid and repair them before running the full batch.

Overwriting outputs accidentally

Avoid writing processed files back into the same folder with the same names. Use a separate destination folder and add a suffix such as _processed.

If you need the basics first, see How to Read a Shapefile in Python with GeoPandas.

Related tasks:

If some files fail to load, see How to Read a Shapefile in Python with GeoPandas first to verify that each input file can be opened correctly.

FAQ

How do I loop through all shapefiles in a folder with Python?

Use pathlib and glob:

from pathlib import Path

for shp_path in Path("data/input").glob("*.shp"):
    print(shp_path.name)

This is the standard way to loop through shapefiles in Python for folder-based GIS automation.

Can GeoPandas batch process shapefiles in subfolders too?

Yes. Use rglob("*.shp") instead of glob("*.shp").

from pathlib import Path

for shp_path in Path("data/input").rglob("*.shp"):
    print(shp_path)

This is useful when your data is organized by region, date, or project folder.

Should I overwrite the original shapefiles or save new copies?

Save new copies unless you have a very controlled workflow. A separate output folder is safer and makes it easier to review results.

Is shapefile the best output format for batch GIS workflows?

Not always. Shapefile is still common, but it has field name limits and other restrictions. For many modern workflows, GeoPackage is a better output format, especially when schema consistency matters.