xESMF backend usage and benchmark

xESMF isn’t just a wrapper of ESMPy. It only uses ESMPy to generate regridding weights, but has its own Scipy-based method for applying weights (see more about regridding weights).

We switch to the Scipy method because its serial performance is much higher than ESMPy’s own engine and can also reuse weights (issue#2). ESMPy’s native method is available in the backend, mainly for benchmarking Scipy results in unit tests.

Here we show how to use xESMF backend and compare the performance of two methods. Note that the backend is still pretty easy to use compared to the original ESMPy – it just doesn’t have a fancy API and cannot deal with xarray metadata.

[1]:
import os
import numpy as np
import xesmf as xe

# backend functions
from xesmf.backend import (
    Grid,
    esmf_regrid_build,
    esmf_regrid_apply,
)
from xesmf.smm import read_weights, apply_weights

Prepare data

We use the same data as in the reusing regridder example, but convert xarray DataSet to pure numpy arrays to work with the backend.

[2]:
ds_in = xe.util.grid_2d(
    -120, 120, 0.4, -60, 60, 0.3  # longitude range and resolution
)  # latitude range and resolution
ds_out = xe.util.grid_2d(-120, 120, 0.6, -60, 60, 0.4)
ds_in.coords["time"] = np.arange(1, 11)
ds_in.coords["lev"] = np.arange(1, 51)
ds_in["data2D"] = xe.data.wave_smooth(ds_in["lon"], ds_in["lat"])
ds_in["data4D"] = ds_in["time"] * ds_in["lev"] * ds_in["data2D"]
[3]:
# backend only accepts pure numpy array
lon_in = ds_in["lon"].values
lat_in = ds_in["lat"].values

lon_out = ds_out["lon"].values
lat_out = ds_out["lat"].values

data_in = ds_in["data4D"].values
data_in.shape
[3]:
(10, 50, 400, 600)

Make ESMF Grid objects

[4]:
grid_in = Grid.from_xarray(lon_in.T, lat_in.T)
grid_out = Grid.from_xarray(lon_out.T, lat_out.T)

This is a native ESMPy Grid object:

[5]:
type(grid_in)
[5]:
xesmf.backend.Grid

We pass the transpose (lon.T) because ESMPy prefer Fortran-ordering to C-ordering (see this issue).

[6]:
lon_in.flags  # numpy arrays are mostly C-ordered
[6]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
[7]:
lon_in.T.flags  # a memory view on its tranpose would be Fortran-ordered
[7]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

Compute weights

[8]:
filename = "test_weights.nc"  # weight filename
if os.path.exists(filename):
    os.remove(filename)  # ESMPy will complain if the file exists

Computing weights takes ~7s, as in the reusing regridder example.

[9]:
%%time
regrid = esmf_regrid_build(grid_in, grid_out, 'bilinear',
                           extra_dims=[50, 10],  # reversed to Fortran-ordering
                           filename=filename)
CPU times: user 4.08 s, sys: 207 ms, total: 4.29 s
Wall time: 4.35 s

It returns a native ESMPy Regrid object:

[10]:
type(regrid)
[10]:
ESMF.api.regrid.Regrid

It also writes weights to disk so we can then read them back for Scipy.

[11]:
%%bash
ncdump -h test_weights.nc
netcdf test_weights {
dimensions:
        n_s = 480000 ;
variables:
        double S(n_s) ;
        int col(n_s) ;
        int row(n_s) ;
}

Apply weights using ESMPy backend

It takes ~3s with ESMPy’s native method.

[12]:
%%time
data_out_esmpy = esmf_regrid_apply(regrid, data_in.T).T
CPU times: user 1.59 s, sys: 2.51 s, total: 4.1 s
Wall time: 12.3 s

The first .T converts C-ordering to F-ordering for ESMPy, and the second .T converts the result back to C-ordering. It just gets a memory view and thus incurs almost no overhead.

[13]:
data_out_esmpy.flags
[13]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
[14]:
data_out_esmpy.shape  # broadcasted over extra dimensions
[14]:
(10, 50, 300, 400)

Apply weights using Scipy backend

Read weights back for Scipy. read_weights needs to know the shape of the sparse matrix, i.e. how many points in input and output grids.

[15]:
weights = read_weights(filename, lon_in.size, lon_out.size)
weights
[15]:
<120000x240000 sparse matrix of type '<class 'numpy.float64'>'
        with 480000 stored elements in COOrdinate format>

apply_weights needs to know shape of the output grid.

[16]:
lon_out.shape
[16]:
(300, 400)
[17]:
%%time
data_out_scipy = apply_weights(weights, data_in, lon_in.shape, lon_out.shape)
CPU times: user 842 ms, sys: 776 ms, total: 1.62 s
Wall time: 6.07 s

It is several times faster than ESMPy’s native method. The conclusion seems to be pretty robust across different platforms (feel free to verify on your own), so we choose Scipy as the default backend.

A likely explanation for this performance discrepancy is, the original ESMF is optimized for large processor counts (~1000 CPUs) at the expense of serial performance (ESMF team, personal communication).

[18]:
data_out_scipy.shape  # broadcasted over extra dimensions
[18]:
(10, 50, 300, 400)
[19]:
np.testing.assert_equal(data_out_scipy, data_out_esmpy)  # exactly the same
[20]:
os.remove(filename)  # clean-up