Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timing info to netcdf-4 logging #2903

Open
edwardhartnett opened this issue Apr 3, 2024 · 2 comments
Open

Add timing info to netcdf-4 logging #2903

edwardhartnett opened this issue Apr 3, 2024 · 2 comments

Comments

@edwardhartnett
Copy link
Contributor

We have netcdf-4 logging and it has a lot of useful information. Here at NOAA it's being used to debug problems on big HPC systems.

One set if information that would be super useful would be some timing info for data read/writes.

What I have in mind is a new constant for nc_set_log_level(), which would turn on timing of reads/writes, and cause that to be output to the log(s). This would help large data producers/readers when trying to figure out their IO performance on HPC systems.

IO is becoming very much the limiting factor, computation is no problem, but writing all that data is taking too long! Detailed info on what is taking up the time would help users optimize large modeling systems.

@DennisHeimbigner
Copy link
Collaborator

The timing needs to have an interval defined. Presumably we would measure some specific
HDF5 API call. But what about caching?

@edwardhartnett
Copy link
Contributor Author

I would add timing in the put/get_vars().

Caches would be happening, and that certainly would complicate the situation, but right now they don't even have a good idea of how each model is using I/O. Overall numbers would help them adjust the caching to improve performance.

What I have in mind is something very simple, just a few extra lines of code to provide basic read/write times in the log. Of course, the profiler is also available to anyone who wants more detailed info.

In PIO I added support for MPE (optionally). This is a little more involved, but gives excellent output for parallel programming, something like this:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants