You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We need a simpler caching & checkpointing story that has full service visibility into what's going on.
Describe the solution you'd like
Checkpointing -- i.e. cache outputs and restart from the latest point.
Intelligent Caching -- i.e. cache nodes and only rerun things if code or data has changed.
These should come with the ability to:
visualize what is going on when using them.
work in a notebook / cli / library context
extend how to hash data & where to store it
Prior art
You do it yourself outside of Hamilton and use the overrides argument in .execute/.materialize(..., overrides={...}) to inject pre-computed values into the graph. That is, you run your code, save the things you want, and then you load them and inject them using overrides=. TODO: show example.
You use the data savers & data loaders. This is similar to the above, but instead you use the Data Savers & Data Loaders (i.e. materializers) to save & then load and inject data in. TODO: show example.
You use the CachingGraphAdapter, which requires you to tag functions to cache along with the serialization format.
You use the DiskCacheAdapter, which uses the diskcache library to store the results on disk.
I took one of the targets examples and transferred it one-to-one to hamilton to see how the concepts compare. Both workflows are implemented in modules and make use of helper functions from a separate module. Then both are started interactively from quarto documents and their results and graphs visualized in the rendered output of said notebooks: http://jmbuhr.de/targets-hamilton-comparison/ (source code: https://github.com/jmbuhr/targets-hamilton-comparison)
Again, this is for exploration of possibilities, not to impose paradigms on you :)
In this first pass I noticed two things I was missing in hamilton compared to targets when it comes to caching:
changing a function that is used by a node, but is not itself a node, should also invalidated the cache of the node
loading the cached result from any node independently of the dr.execute run as with tar_load(<name of node>) (https://docs.ropensci.org/targets/reference/tar_load.html) is super helpful for interactively picking up where you left of with a workflow and working on different parts of it.
Is your feature request related to a problem? Please describe.
We need a simpler caching & checkpointing story that has full service visibility into what's going on.
Describe the solution you'd like
These should come with the ability to:
Prior art
Could use https://books.ropensci.org/targets/walkthrough.html#change-code as inspiration.
Additional context
Slack threads:
Next steps:
TODO: write up tasks in this issue into smaller and manageable chunks.
The text was updated successfully, but these errors were encountered: