Halve model loading time for llama demo #4032

swolchok · 2024-06-21T17:33:01Z

Summary:
mmap is not recommended for large sequential workloads -- you
have to take a bunch of page faults. Surprisingly, this doesn't seem
to hurt reported peak memory usage.

Differential Revision: D58826044

pytorch-bot · 2024-06-21T17:33:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4032

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4d922b4 with merge base 3eec95a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-06-21T17:33:24Z

This pull request was exported from Phabricator. Differential Revision: D58826044

facebook-github-bot · 2024-06-22T00:33:25Z

This pull request was exported from Phabricator. Differential Revision: D58826044

Summary: Pull Request resolved: pytorch#4032 mmap is not recommended for large sequential workloads -- you have to take a bunch of page faults. Surprisingly, this doesn't seem to hurt reported peak memory usage. Differential Revision: D58826044

facebook-github-bot · 2024-06-28T06:22:04Z

This pull request was exported from Phabricator. Differential Revision: D58826044

Summary: Pull Request resolved: pytorch#4032 mmap is not recommended for large sequential workloads -- you have to take a bunch of page faults. I originally assumed this would hurt peak memory usage (we read all the weights into memory at once and then pack them; packing is basically copying them), but it doesn't. In retrospect, this makes sense because we actually operate on one weights tensor at a time, and the individual tensors aren't gigantic, there are just a lot of them. Reviewed By: larryliu0820 Differential Revision: D58826044

facebook-github-bot · 2024-06-28T22:36:11Z

This pull request was exported from Phabricator. Differential Revision: D58826044

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 21, 2024

facebook-github-bot added the fb-exported label Jun 21, 2024

swolchok requested review from kimishpatel, shoumikhin, larryliu0820 and mcr229 June 21, 2024 21:06

larryliu0820 approved these changes Jun 21, 2024

View reviewed changes

swolchok force-pushed the export-D58826044 branch from b8da11a to 01ff5b5 Compare June 22, 2024 00:33

swolchok force-pushed the export-D58826044 branch from 01ff5b5 to 8daba50 Compare June 28, 2024 06:22

swolchok force-pushed the export-D58826044 branch from 8daba50 to 4d922b4 Compare June 28, 2024 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Halve model loading time for llama demo #4032

Halve model loading time for llama demo #4032

swolchok commented Jun 21, 2024

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading

facebook-github-bot commented Jun 21, 2024

facebook-github-bot commented Jun 22, 2024

facebook-github-bot commented Jun 28, 2024

facebook-github-bot commented Jun 28, 2024

Halve model loading time for llama demo #4032

Are you sure you want to change the base?

Halve model loading time for llama demo #4032

Conversation

swolchok commented Jun 21, 2024

pytorch-bot bot commented Jun 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4032

✅ No Failures

facebook-github-bot commented Jun 21, 2024

facebook-github-bot commented Jun 22, 2024

facebook-github-bot commented Jun 28, 2024

facebook-github-bot commented Jun 28, 2024

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading