Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a walk function for navigating THREDDS catalogue #754

Open
tlogan2000 opened this issue Dec 21, 2023 · 2 comments · May be fixed by #755
Open

Add a walk function for navigating THREDDS catalogue #754

tlogan2000 opened this issue Dec 21, 2023 · 2 comments · May be fixed by #755

Comments

@tlogan2000
Copy link

As far a I know this functionality does not exist already but believe it would be a welcome addition :

I often need to find all datasets for multiple subfolders of a thredds catalogue. To do this I resort to using a custom function in my data processing scripts (see simple example below) but ideally this would be built into siphon itself.

from siphon.catalog import TDSCatalog
# walk function
def walk(cat, depth=1):
    """Return a generator walking a THREDDS data catalog for datasets.

    Parameters
    ----------
    cat : TDSCatalog
      THREDDS catalog.
    depth : int
      Maximum recursive depth. Setting 0 will return only datasets within the top-level catalog. If None,
      depth is set to 1000.
    """
    yield from cat.datasets.items()
    if depth is None:
        depth = 1000

    if depth > 0:
        for name, ref in cat.catalog_refs.items():
            try:
                child = ref.follow()
                yield from walk(child, depth=depth - 1)

            except requests.HTTPError as exc:
                LOGGER.exception(exc)

# creat catalogue
cat = TDSCatalog(urlcat)
# access all dataset to 20 subfolders
for dd in (cat, depth=20):
    print(dd)

@dopplershift
Copy link
Member

This seems like it could be a nice addition. Would you be interested in submitting a PR adding it? My only question is whether yielding from items() (so name, Dataset pairs) makes the most sense, or whether just the Dataset would be enough, since you could still get the name from ds.name?

@tlogan2000
Copy link
Author

tlogan2000 commented Jan 8, 2024

@dopplershift Sorry for the delay yes I can try to throw something together in the coming weeks. Would the most logical place to make the addition simply be a new method in the catalogue class?

@tlogan2000 tlogan2000 linked a pull request Jan 8, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants