Proposed new regridder save-load API in 0.3.0 (not backward compatible) #75

JiaweiZhuang · 2019-11-17T02:03:28Z

I'd like to implement a new regridder save-load API to resolve #11. ESMPy 8.0.0 is the prerequisite for this, and is now available on Conda conda-forge/esmpy-feedstock#26

Before implementing this, I'd like users of xESMF to review and comment on the new API, because there will be backward-incompatible changes.

Description of the changes

The current & old API (until 0.2.x) is:

regridder = xesmf.Regridder(
    grid_in, grid_out, method, filename=filename
    )
regridder_reload = xesmf.Regridder(
    grid_in, grid_out, method, filename=filename, reuse_weights=True
    )

The first call writes out a NetCDF file containing regridding weights, and reads the file back to memory to construct a SciPy sparse matrix (representing the regridding operation). Such extra I/O exists because ESMPy < 8.0.0 can only dump weights to disk. ESMPy 8.0.0 allows in-memory retrival of weights as numpy arrays, so such disk I/O is no longer needed.

The second call, with reuse_weights=True, detects that the weight file exists and simply reads it back without re-computation.

filename is an optional kwarg. If not specified, the regridder file will be given a default name like bilinear_25x53_59x87.nc, indicating (regrid method, input grid shape, output grid shape).

The new API (starting 0.3.0) will be:

regridder = xesmf.Regridder(grid_in, grid_out, method)
regridder.save(filename)
regridder_reload = xesmf.load_regridder(filename)

The first call only constructs the regridder in-memory, and will not write any files to disk.

Instead of implicitly handing the save/load as part of xesmf.Regridder() call, the new save() and load_regridder() do so explicitly. They are similar to save()/load_model() in Keras or save/load() in PyTorch.

The original reuse_weights and filename kwargs will be removed in xesmf.Regridder().

Major question on backward-compatibility

Should the reuse_weights=True kwarg be kept for backward-compatibility? My inclination is no, as there should only be one unambiguous way to load the regridder.

However, one convenient thing about the old xesmf.Regridder(..., reuse_weights=True) API is that this single line of code works in any cases. If the weight file doesn't exist on disk, xesmf builds it and writes to disk; if exists, xesmf reads it back without re-computation.

The same logic with the new API would be:

filename = 'your-regridder-name.nc'
if os.path.exists(filename):
    regridder = xesmf.load_regridder(filename)
else:
    regridder = xesmf.Regridder(grid_in, grid_out, method)
    regridder.save(filename)

This is more verbose, but also more explicit, and explicit is better than implicit

Minor questions

Should it be xesmf.load_regridder() or just xesmf.load()? I prefer the former as it's more explicit.
Should regridder.save() has a default filename? I slightly prefer no, but some people might have a hard time choosing a name :) Can still use the current default naming scheme like bilinear_25x53_59x87.nc. Welcome other suggestions.

The text was updated successfully, but these errors were encountered:

zhonghua-zheng · 2019-11-17T02:37:20Z

Major question: I think it should be kept.
Minor questions:
1.I prefer xesmf.load(), is there any other thing that we can load?
2.I think we have to define the name before saving... But if you want set a default filename, you have to remind the users if the filename already exists.

JiaweiZhuang · 2019-11-17T03:17:31Z

is there any other thing that we can load?

Good point... I haven't thought of a case. One concern is that load() can be confused with the frequently used xarray.Dataset.load() which reads lazy data into memory. xesmf.load_regridder() looks more similar to the naming of xarray.load_dataset() which also reads a file from disk.

Major question: I think it should be kept.

I am thinking about catching the use_weights and filename kwargs, throwing an instructive error message, and maybe directing people to this issue. This seems good enough for not breaking the current user workflow?

spencerahill · 2019-11-19T00:50:37Z

Should the reuse_weights=True kwarg be kept for backward-compatibility?

This new API seems like a nice step forward, and the package is young enough that you shouldn't worry too much about backwards compatibility IMO. It's still 0.X after all 😄 Thanks for xESMF! It's an excellent tool.

ahuang11 · 2019-11-22T03:55:36Z

Would it be helpful to document whether the longitudes range from -180:180 vs 0:360 in the filename like: bilinear_25x53_59x87_180.nc If I initially have a file that ranges from -180 to 180, but then have another file with the same dimensions from 0 to 360 and I reuse weights, I think the result is inaccurate.

Also, I think it'd be nice to have an output_dir kwarg under the save method so that users can keep the default format, but be able to output the save in a scratch directory.

What about xesmf.open_regridder() like xr.open_dataset?

I prefer having a default filename since naming is hard.

huard · 2020-04-16T01:01:47Z

regridder.save would have to save not only the weights, but also the parameters passed to the original Regridder call, including the lat and lon coordinates, right ? This could create headaches if you want to maintain backward compatibility to files created with earlier versions.

Maybe a simpler option would be to have a weights argument in Regridder :

weights : None, str, Path, xarray.Dataset, scipy.sparse.coo_matrix, dict
  Regridding weights, stored as a sparse coo matrix, a dictionary with keys `row_dst`, `col_src` and 
  `weights`, an xarray.Dataset with data variables `col`, `row` and `S`, or a path to a netCDF file 
  created by ESMF. If None, compute the weights.

An alternative to save could be to_netcdf, which would save the weights only as a file using the same naming as ESMF.

FeiYao-Edinburgh · 2020-05-08T16:16:46Z

The new API (starting 0.3.0) will be:

regridder = xesmf.Regridder(grid_in, grid_out, method)
regridder.save(filename)
regridder_reload = xesmf.load_regridder(filename)

I encountered the same MemoryError when doing loops. I have updated xESMF to 0.3.0 but have not found these methods. Does this mean the backward-compatibility has been kept? With the latest version and the same calling, I still see the memory cannot be released timely, see my following screenshot. Did I miss something?

huard · 2020-05-08T17:11:33Z

The API changes proposed above have not been released, nor merged into master. If you want to take a look at the branch #91 and provide feedback, that would be useful.

Add how-to instructions on how to cut a new release

JiaweiZhuang pinned this issue Nov 21, 2019

This was referenced Nov 21, 2019

Print Statement Suppression #77

Closed

High memory usage #53

Open

JiaweiZhuang mentioned this issue Dec 3, 2019

Value of cells in the new grid that are outside the old grid's domain #15

Open

huard mentioned this issue Apr 16, 2020

Weights are kept in memory instead of being written to disk. #91

Closed

huard mentioned this issue Aug 3, 2020

Weights are returned in memory instead of being written to disk pangeo-data/xESMF#3

Merged

aulemahal pushed a commit to Ouranosinc/xESMF that referenced this issue Feb 18, 2021

Merge pull request JiaweiZhuang#75 from pangeo-data/deploy-doc

ecd15a0

Add how-to instructions on how to cut a new release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed new regridder save-load API in 0.3.0 (not backward compatible) #75

Proposed new regridder save-load API in 0.3.0 (not backward compatible) #75

JiaweiZhuang commented Nov 17, 2019 •

edited

Loading

zhonghua-zheng commented Nov 17, 2019

JiaweiZhuang commented Nov 17, 2019

spencerahill commented Nov 19, 2019

ahuang11 commented Nov 22, 2019 •

edited

Loading

huard commented Apr 16, 2020

FeiYao-Edinburgh commented May 8, 2020 •

edited

Loading

huard commented May 8, 2020

Proposed new regridder save-load API in 0.3.0 (not backward compatible) #75

Proposed new regridder save-load API in 0.3.0 (not backward compatible) #75

Comments

JiaweiZhuang commented Nov 17, 2019 • edited Loading

Description of the changes

Major question on backward-compatibility

Minor questions

zhonghua-zheng commented Nov 17, 2019

JiaweiZhuang commented Nov 17, 2019

spencerahill commented Nov 19, 2019

ahuang11 commented Nov 22, 2019 • edited Loading

huard commented Apr 16, 2020

FeiYao-Edinburgh commented May 8, 2020 • edited Loading

huard commented May 8, 2020

JiaweiZhuang commented Nov 17, 2019 •

edited

Loading

ahuang11 commented Nov 22, 2019 •

edited

Loading

FeiYao-Edinburgh commented May 8, 2020 •

edited

Loading