EPSG Codes Explained: How to Choose the Right CRS in Python

Problem statement

In Python GIS work, you will often see CRS values like EPSG:4326, EPSG:3857, or a local projected system such as a UTM zone. The problem is not just understanding what the number means. The practical problem is choosing the right CRS for the job.

If you use the wrong CRS, common GIS tasks can produce misleading results:

  • distance and area calculations can be wrong
  • layers can appear in different places
  • spatial joins and overlays can return wrong or empty results
  • map output can look distorted or confusing

This is a common issue when reading shapefiles, GeoJSON, or data from web APIs in GeoPandas. Many users can load the data, but are not sure whether to keep the CRS, assign one, or reproject it.

Quick answer

An EPSG code is a standard identifier for a coordinate reference system (CRS).

The practical rule in Python GIS workflows is:

  • use the source CRS that matches the data when the metadata is correct
  • assign a CRS only when metadata is missing and you know what the source CRS should be
  • use a projected CRS for area and distance analysis
  • use a geographic CRS like EPSG:4326 mainly for storage, exchange, GPS, and many web/API workflows

In Python GIS, you will usually inspect CRS with GeoPandas using .crs, assign missing CRS metadata with .set_crs(), and transform coordinates with .to_crs().

Step-by-step solution

Step 1: Check the current CRS of your data

Always inspect the CRS first.

import geopandas as gpd

gdf = gpd.read_file("data/city_parks.geojson")
print(gdf.crs)

Possible results:

  • EPSG:4326
  • a full WKT CRS definition
  • None if the file has no CRS metadata

This tells you whether the dataset already has a known coordinate reference system.

Step 2: Decide whether the CRS is correct or just missing

There are two different situations:

  1. The coordinates are already correct, but CRS metadata is missing
  2. The data must be transformed into another CRS

These are not the same.

If the coordinates are longitude and latitude values and you know the file should be WGS84, assign the CRS:

gdf = gdf.set_crs("EPSG:4326")

This only labels the data. It does not change coordinates.

If the data is already labeled correctly but you need another CRS for analysis, reproject it:

gdf_projected = gdf.to_crs("EPSG:32633")

This changes the coordinate values.

A common mistake is assigning a new CRS with set_crs() when the data actually needs to_crs().

Step 3: Match the CRS to the task

Choose the CRS based on what you are doing.

Web maps and GPS data

Use cases often involve:

  • EPSG:4326 for latitude/longitude data
  • EPSG:3857 for web map display

Distance and area analysis

Use a projected CRS with linear units such as meters or feet. Good choices are often:

  • local UTM zones
  • official national projected systems
  • regional projected CRS used by your organization

Multi-layer analysis

If you are doing overlays, joins, buffering, or measurement, put all layers into one shared projected CRS when possible.

Step 4: Reproject data when needed

Reprojection is required when the current CRS is not suitable for the task.

Example: convert polygon data from WGS84 to a projected CRS before calculating area.

import geopandas as gpd

districts = gpd.read_file("data/districts.geojson")
print(districts.crs)

districts_m = districts.to_crs("EPSG:32633")
districts_m["area_sq_m"] = districts_m.area

print(districts_m[["area_sq_m"]].head())

If you calculate area directly in EPSG:4326, the results will be in angular units, not useful square meters.

Step 5: Verify the result before analysis

After assigning or transforming CRS, check that the result makes sense.

import geopandas as gpd

roads = gpd.read_file("data/roads.shp")
parcels = gpd.read_file("data/parcels.shp")

print(roads.crs)
print(parcels.crs)

parcels = parcels.to_crs(roads.crs)

print(roads.total_bounds)
print(parcels.total_bounds)

Useful checks:

  • do layers now overlap in the expected area?
  • are the units meters or feet if you need measurement?
  • do bounds look reasonable for the location?
  • does a sample buffer or distance result look realistic?

Code examples

Example 1: Read a file and inspect its CRS

import geopandas as gpd

buildings = gpd.read_file("data/buildings.shp")
print("CRS:", buildings.crs)

This confirms whether CRS metadata exists.

Example 2: Assign a missing EPSG code to data

import geopandas as gpd

points = gpd.read_file("data/gps_points.geojson")

if points.crs is None:
    points = points.set_crs("EPSG:4326")

print(points.crs)

Use this only when the coordinates are already in that CRS and the metadata is missing.

Example 3: Reproject data to a projected CRS for analysis

import geopandas as gpd

points = gpd.read_file("data/gps_points.geojson")

if points.crs is None:
    points = points.set_crs("EPSG:4326")

points_utm = points.to_crs("EPSG:32633")

points_utm["buffer_500m"] = points_utm.buffer(500)
print(points_utm.crs)

This is the correct pattern when you need meter-based analysis.

Example 4: Confirm layer alignment after reprojection

import geopandas as gpd

rivers = gpd.read_file("data/rivers.shp")
catchments = gpd.read_file("data/catchments.geojson")

catchments_aligned = catchments.to_crs(rivers.crs)

print("Rivers bounds:", rivers.total_bounds)
print("Catchments bounds:", catchments_aligned.total_bounds)

If the bounds now fall in the same region and the map overlays correctly, the reprojection likely worked.

Explanation

An EPSG code is a short identifier for a full CRS definition. In Python libraries, you will usually see it written like this:

"EPSG:4326"

That code tells GeoPandas, Shapely-based workflows, and other GIS tools how coordinates relate to real places on Earth.

The most important practical distinction is between geographic and projected CRS:

  • Geographic CRS uses angular units, usually degrees
  • Projected CRS uses linear units, usually meters or feet

This matters because many geometry operations depend on the CRS. If you measure distance or area in a geographic CRS, the results are usually not suitable for analysis.

You will see a few EPSG codes often:

  • EPSG:4326 — WGS84 latitude/longitude; common for GPS, GeoJSON, and APIs
  • EPSG:3857 — Web Mercator; common for web maps and basemaps
  • UTM or national projected EPSG codes — better for local analysis and measurement

The workflow is simple:

  1. inspect the CRS with .crs
  2. decide whether the CRS is missing or whether transformation is needed
  3. choose the CRS based on the task
  4. verify alignment and units before analysis

The key distinction is:

  • set_crs() assigns metadata
  • to_crs() transforms coordinates

That is the key idea behind choosing the correct CRS workflow in Python GIS. If your data is in WGS84 and you need accurate area or distance, reproject it to a projected CRS first. If your file has no CRS metadata but you know what it should be, assign the correct CRS without changing coordinates.

Edge cases and notes

Files without an EPSG code but with a valid CRS definition

Some files store CRS as WKT or PROJ text rather than a simple EPSG code. GeoPandas may still read this correctly. You do not always need a numeric EPSG value if the CRS definition is valid.

Data from different sources may use similar coordinate values

Two datasets can appear similar and still use different CRS. Do not rely on visual similarity alone. Check .crs for every layer.

Not every projected CRS is suitable for every analysis

A projected CRS is not automatically the right one. Prefer local or official projected systems for better accuracy.

For example, if you are measuring parcel areas in one city, a local UTM zone or official local projected CRS is usually a better choice than a global web map CRS.

EPSG:3857 is useful for display, not precise measurement

EPSG:3857 is standard for web maps, but it is usually not the best choice for precise distance or area calculations.

Invalid geometries can affect results

Even with the correct CRS, invalid polygons or broken geometries can cause overlay or measurement issues. If results look wrong, also validate your geometry, not just the CRS.

To understand the broader concept, see Coordinate Reference Systems (CRS) Explained for Python GIS.

For related tasks, read GeoPandas Basics: Working with Spatial Data in Python and Python for GIS: What It Is and When to Use It.

FAQ

What is the difference between an EPSG code and a CRS?

A CRS is the full coordinate reference system definition. An EPSG code is a standard identifier for one specific CRS.

When should I use EPSG:4326 in Python GIS?

Use EPSG:4326 for latitude/longitude data, GPS data, GeoJSON, and many API or storage workflows. It is usually not the best choice for distance or area analysis.

How do I know if I should set a CRS or reproject the data?

Use set_crs() when coordinates are already correct and only the metadata is missing. Use to_crs() when you need to transform the coordinates into another CRS.

Why are my distance or area calculations wrong after loading data in GeoPandas?

You are likely working in a geographic CRS such as EPSG:4326. Reproject to a suitable projected CRS before measuring.