Tutorial: Regridding 2D NetCDF Datasets in Python with xESMF
Author: Callum Smith, Jan 2026 (last updated: Jan 2026)
Regridding (also called remapping or resampling) is a common task in geosciences, especially when working with gridded data such as satellite or climate model outputs. The goal is to interpolate data from one grid to another, which is essential for comparing datasets, combining products, or preparing data for models.
In this tutorial, we'll use the Python package xESMF to regrid 2D NetCDF datasets. xESMF is built on top of xarray and ESMF, providing a simple interface for regridding with various algorithms.
Prerequisites
If not already present, install the required packages:
mamba install xarray xesmf
1. Loading a NetCDF Dataset
We'll use xarray to open NetCDF files. Here, we assume you have a 2D variable (e.g., satellite data) with latitude and longitude coordinates.
import xarray as xr
ds = xr.open_dataset('input_data.nc')
print(ds)
Plotting the Input Data
import matplotlib.pyplot as plt
ds['your_variable'].plot()
plt.title('Original Data on Source Grid')
plt.show()
2. Defining the Target Grid
You need to define the grid you want to regrid to. This can be another dataset's grid, or you can create a new one. Here, we create a regular lat/lon grid:
import numpy as np
target_grid = xr.Dataset({
'lat': (['lat'], np.arange(-90, 90.1, 1.0)),
'lon': (['lon'], np.arange(0, 360, 1.0)),
})
Visualizing the Target Grid
plt.figure()
plt.scatter(target_grid['lon'], target_grid['lat'], s=1)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Target Grid Points')
plt.show()
3. Regridding with xESMF
xESMF supports several regridding methods:
Choosing a Regridding Algorithm
The choice of algorithm depends on your data and scientific goals:
-
Bilinear: Uses weighted averages of the four nearest grid points. It is smooth and works well for continuous variables (e.g., temperature, pressure). However, it does not conserve the total sum of the variable, so it is not suitable for fluxes or quantities where conservation is important.
-
Conservative: Ensures that the integral (sum) of the variable is preserved during regridding. This is essential for variables like precipitation, runoff, or any fluxes. It requires both source and target grids to define cell boundaries (i.e., grid must be defined by cell centers and edges). It can be sensitive to missing values (NaNs), which may cause the output to be NaN if any input cell is NaN.
-
Conservative-normed: Similar to conservative, but normalizes the weights so that if some source cells are NaN, the valid part of the cell is still used. This is especially useful for satellite data or observational products with missing values, as it avoids propagating NaNs unnecessarily. Use this when you want conservation but need to handle missing data robustly.
-
Nearest_s2d / nearest_d2s: Assigns the value of the nearest source (or destination) grid cell. This is fast and preserves original values, but can introduce blocky artifacts. Use for categorical data (e.g., land/sea masks, land cover types) or when you want to avoid interpolation.
-
Patch: A higher-order method that can provide smoother results for some variables, but is less commonly used and more computationally intensive.
Summary Table
| Algorithm | Preserves Integrals | Handles NaNs Well | Smooth | Use For |
|---|---|---|---|---|
| bilinear | No | Moderate | Yes | Continuous fields |
| conservative | Yes | No | No | Fluxes, precipitation |
| conservative-normed | Yes | Yes | No | Fluxes with missing data |
| nearest_s2d/d2s | No | Yes | No | Categorical, masks |
| patch | No | Moderate | Yes | Advanced, smooth fields |
Example: Bilinear Regridding
import xesmf as xe
regridder = xe.Regridder(ds, target_grid, 'bilinear')
regridded = regridder(ds['your_variable'])
regridded.to_netcdf('output_bilinear.nc')
Example: Conservative Regridding
regridder_cons = xe.Regridder(ds, target_grid, 'conservative')
regridded_cons = regridder_cons(ds['your_variable'])
regridded_cons.to_netcdf('output_conservative.nc')
Example: Conservative-normed Regridding
The conservative-normed method is designed to handle missing values (NaNs) more robustly than standard conservative regridding. In the standard conservative method, if any part of a source cell is NaN, the entire destination cell may become NaN. The conservative-normed method normalizes the weights so that only the valid (non-NaN) fraction of the source cell contributes to the destination cell, preventing unnecessary propagation of NaNs.
Using a 'mask' Layer
To take full advantage of conservative-normed, you should provide a mask variable in your xarray dataset. This mask should be a DataArray with the same shape as your data, where valid data points are marked as 1 (or True) and missing/invalid points as 0 (or False). xESMF will use this mask to determine which parts of the grid are valid during regridding.
Example of adding a mask:
import numpy as np
# Suppose ds['your_variable'] contains NaNs for missing data
mask = (~np.isnan(ds['your_variable'])).astype(int)
ds['mask'] = (ds['your_variable'].dims, mask)
# Now use conservative-normed
regridder_normed = xe.Regridder(ds, target_grid, 'conservative_normed')
regridded_normed = regridder_normed(ds['your_variable'])
regridded_normed.to_netcdf('output_conservative_normed.nc')
If you do not provide a mask, xESMF will infer it from the NaN pattern in your data, but explicitly providing a mask is more robust and recommended for complex or irregular missing data patterns.
For more, see the xESMF documentation on masking.
Example: Nearest Neighbor Regridding
regridder_nn = xe.Regridder(ds, target_grid, 'nearest_s2d')
regridded_nn = regridder_nn(ds['your_variable'])
regridded_nn.to_netcdf('output_nearest.nc')
4. Considerations
Working with Large Datasets
Regridding large datasets (e.g., high-resolution satellite data or long time series) can be memory- and compute-intensive. Here are some tips to improve performance:
- Use Dask for Chunking: xarray and xESMF support dask arrays, which allow you to process data in chunks and parallelize operations. Open your dataset with chunking:
ds = xr.open_dataset('input_data.nc', chunks={'time': 10, 'lat': 100, 'lon': 100})
# Adjust chunk sizes to fit your memory and data shape
-
Saving and Reusing Regridding Weights: When you create a regridder in xESMF, it computes a weight matrix that maps the source grid to the target grid. This computation can be slow for large grids, but you can save the weights to a file and reload them later for faster repeated regridding.
Example (This is especially useful when you need to regrid many variables or process data in chunks, as you only need to compute the weights once):
# First time: compute and save weights
regridder = xe.Regridder(ds, target_grid, 'bilinear', filename='my_weights.nc')
# Next time: reuse the saved weights (much faster)
regridder = xe.Regridder(ds, target_grid, 'bilinear', filename='my_weights.nc', reuse_weights=True)
-
Parallel Processing: If you have access to a cluster or multicore machine, dask can distribute the computation. Set up a dask cluster for even faster processing.
-
Reduce Data Size: If possible, subset your data in time or space before regridding, or use coarser grids for exploratory analysis.
-
Monitor Memory Usage: Large regridding operations can use a lot of RAM. Monitor your system and adjust chunk sizes or process data in smaller batches if needed.
For more details, see the xESMF documentation on dask and performance.
5. Visualizing the Results
import matplotlib.pyplot as plt
regridded.plot()
plt.title('Regridded Data (Bilinear)')
plt.show()
Comparing Input and Output
You can compare the original and regridded data side by side:
fig, axs = plt.subplots(1, 2, figsize=(12, 5))
ds['your_variable'].plot(ax=axs[0])
axs[0].set_title('Original Data')
regridded.plot(ax=axs[1])
axs[1].set_title('Regridded Data (Bilinear)')
plt.tight_layout()
plt.show()
For more advanced analysis, you can plot the difference:
diff = regridded - ds['your_variable'].interp_like(regridded)
diff.plot()
plt.title('Difference After Regridding')
plt.show()