Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed new regridder save-load API in 0.3.0 (not backward compatible) #75

Open
JiaweiZhuang opened this issue Nov 17, 2019 · 7 comments · Fixed by pangeo-data/xESMF#3

Comments

@JiaweiZhuang
Copy link
Owner

JiaweiZhuang commented Nov 17, 2019

I'd like to implement a new regridder save-load API to resolve #11. ESMPy 8.0.0 is the prerequisite for this, and is now available on Conda conda-forge/esmpy-feedstock#26

Before implementing this, I'd like users of xESMF to review and comment on the new API, because there will be backward-incompatible changes.

Description of the changes

The current & old API (until 0.2.x) is:

regridder = xesmf.Regridder(
    grid_in, grid_out, method, filename=filename
    )
regridder_reload = xesmf.Regridder(
    grid_in, grid_out, method, filename=filename, reuse_weights=True
    )

The first call writes out a NetCDF file containing regridding weights, and reads the file back to memory to construct a SciPy sparse matrix (representing the regridding operation). Such extra I/O exists because ESMPy < 8.0.0 can only dump weights to disk. ESMPy 8.0.0 allows in-memory retrival of weights as numpy arrays, so such disk I/O is no longer needed.

The second call, with reuse_weights=True, detects that the weight file exists and simply reads it back without re-computation.

filename is an optional kwarg. If not specified, the regridder file will be given a default name like bilinear_25x53_59x87.nc, indicating (regrid method, input grid shape, output grid shape).

The new API (starting 0.3.0) will be:

regridder = xesmf.Regridder(grid_in, grid_out, method)
regridder.save(filename)
regridder_reload = xesmf.load_regridder(filename)

The first call only constructs the regridder in-memory, and will not write any files to disk.

Instead of implicitly handing the save/load as part of xesmf.Regridder() call, the new save() and load_regridder() do so explicitly. They are similar to save()/load_model() in Keras or save/load() in PyTorch.

The original reuse_weights and filename kwargs will be removed in xesmf.Regridder().

Major question on backward-compatibility

Should the reuse_weights=True kwarg be kept for backward-compatibility? My inclination is no, as there should only be one unambiguous way to load the regridder.

However, one convenient thing about the old xesmf.Regridder(..., reuse_weights=True) API is that this single line of code works in any cases. If the weight file doesn't exist on disk, xesmf builds it and writes to disk; if exists, xesmf reads it back without re-computation.

The same logic with the new API would be:

filename = 'your-regridder-name.nc'
if os.path.exists(filename):
    regridder = xesmf.load_regridder(filename)
else:
    regridder = xesmf.Regridder(grid_in, grid_out, method)
    regridder.save(filename)

This is more verbose, but also more explicit, and explicit is better than implicit

Minor questions

  • Should it be xesmf.load_regridder() or just xesmf.load()? I prefer the former as it's more explicit.
  • Should regridder.save() has a default filename? I slightly prefer no, but some people might have a hard time choosing a name :) Can still use the current default naming scheme like bilinear_25x53_59x87.nc. Welcome other suggestions.
@zhonghua-zheng
Copy link

Major question: I think it should be kept.
Minor questions:
1.I prefer xesmf.load(), is there any other thing that we can load?
2.I think we have to define the name before saving... But if you want set a default filename, you have to remind the users if the filename already exists.

@JiaweiZhuang
Copy link
Owner Author

is there any other thing that we can load?

Good point... I haven't thought of a case. One concern is that load() can be confused with the frequently used xarray.Dataset.load() which reads lazy data into memory. xesmf.load_regridder() looks more similar to the naming of xarray.load_dataset() which also reads a file from disk.

Major question: I think it should be kept.

I am thinking about catching the use_weights and filename kwargs, throwing an instructive error message, and maybe directing people to this issue. This seems good enough for not breaking the current user workflow?

@spencerahill
Copy link

Should the reuse_weights=True kwarg be kept for backward-compatibility?

This new API seems like a nice step forward, and the package is young enough that you shouldn't worry too much about backwards compatibility IMO. It's still 0.X after all 😄 Thanks for xESMF! It's an excellent tool.

@JiaweiZhuang JiaweiZhuang pinned this issue Nov 21, 2019
This was referenced Nov 21, 2019
@ahuang11
Copy link

ahuang11 commented Nov 22, 2019

Would it be helpful to document whether the longitudes range from -180:180 vs 0:360 in the filename like: bilinear_25x53_59x87_180.nc If I initially have a file that ranges from -180 to 180, but then have another file with the same dimensions from 0 to 360 and I reuse weights, I think the result is inaccurate.

Also, I think it'd be nice to have an output_dir kwarg under the save method so that users can keep the default format, but be able to output the save in a scratch directory.

What about xesmf.open_regridder() like xr.open_dataset?

I prefer having a default filename since naming is hard.

@huard
Copy link

huard commented Apr 16, 2020

regridder.save would have to save not only the weights, but also the parameters passed to the original Regridder call, including the lat and lon coordinates, right ? This could create headaches if you want to maintain backward compatibility to files created with earlier versions.

Maybe a simpler option would be to have a weights argument in Regridder :

weights : None, str, Path, xarray.Dataset, scipy.sparse.coo_matrix, dict
  Regridding weights, stored as a sparse coo matrix, a dictionary with keys `row_dst`, `col_src` and 
  `weights`, an xarray.Dataset with data variables `col`, `row` and `S`, or a path to a netCDF file 
  created by ESMF. If None, compute the weights. 

An alternative to save could be to_netcdf, which would save the weights only as a file using the same naming as ESMF.

@FeiYao-Edinburgh
Copy link

FeiYao-Edinburgh commented May 8, 2020

The new API (starting 0.3.0) will be:

regridder = xesmf.Regridder(grid_in, grid_out, method)
regridder.save(filename)
regridder_reload = xesmf.load_regridder(filename)

I encountered the same MemoryError when doing loops. I have updated xESMF to 0.3.0 but have not found these methods. Does this mean the backward-compatibility has been kept? With the latest version and the same calling, I still see the memory cannot be released timely, see my following screenshot. Did I miss something?

image

@huard
Copy link

huard commented May 8, 2020

The API changes proposed above have not been released, nor merged into master. If you want to take a look at the branch #91 and provide feedback, that would be useful.

aulemahal pushed a commit to Ouranosinc/xESMF that referenced this issue Feb 18, 2021
Add how-to instructions on how to cut a new release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants