Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v.2.4.0] Release Tracker #128436

Open
atalman opened this issue Jun 11, 2024 · 34 comments
Open

[v.2.4.0] Release Tracker #128436

atalman opened this issue Jun 11, 2024 · 34 comments
Labels
oncall: releng In support of CI and Release Engineering release tracker Add this label to release tracker issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@atalman
Copy link
Contributor

atalman commented Jun 11, 2024

We cut a release branch for the 2.4.0 release.

Our plan from this point from this point is roughly:

  • Phase 1 (until 7/1/24): work on finalizing the release branch
  • Phase 2 (after 7/1/24): perform extended integration/stability/performance testing based on Release Candidate builds.

This issue is for tracking cherry-picks to the release branch.

Cherry-Pick Criteria

Phase 1 (until 7/1/24):

Only low-risk changes may be cherry-picked from main:

  1. Fixes to regressions against the most recent minor release (e.g. 2.3.x for this release; see module: regression issue list)
  2. Critical fixes for: silent correctness, backwards compatibility, crashes, deadlocks, (large) memory leaks
  3. Critical fixes to new features introduced in the most recent minor release (e.g. 2.3.x for this release)
  4. Test/CI fixes
  5. Documentation improvements
  6. Compilation fixes or ifdefs required for different versions of the compilers or third-party libraries
  7. Release branch specific changes (e.g. change version identifiers)

Any other change requires special dispensation from the release managers (currently @atalman, @PaliC , @huydhn, @malfet). If this applies to your change please write "Special Dispensation" in the "Criteria Category:" template below and explain.

Phase 2 (after 7/1/24):

Note that changes here require us to rebuild a Release Candidate and restart extended testing (likely delaying the release). Therefore, the only accepted changes are Release-blocking critical fixes for: silent correctness, backwards compatibility, crashes, deadlocks, (large) memory leaks

Changes will likely require a discussion with the larger release team over VC or Slack.

Cherry-Pick Process

  1. Ensure your PR has landed in master. This does not apply for release-branch specific changes (see Phase 1 criteria).

  2. Create (but do not land) a PR against the release branch.

    # Find the hash of the commit you want to cherry pick
    # (for example, abcdef12345)
    git log
    
    git fetch origin release/2.4
    git checkout release/2.4
    git cherry-pick -x abcdef12345
    
    # Submit a PR based against 'release/2.4' either:
    # via the GitHub UI
    git push my-fork
    
    # via the GitHub CLI
    gh pr create --base release/2.4
  3. Make a request below with the following format:

Link to landed trunk PR (if applicable):
* 

Link to release branch PR:
* 

Criteria Category:
* 
  1. Someone from the release team will reply with approved / denied or ask for more information.
  2. If approved, someone from the release team will merge your PR once the tests pass. Do not land the release branch PR yourself.

NOTE: Our normal tools (ghstack / ghimport, etc.) do not work on the release branch.

Please note HUD Link with branch CI status and link to the HUD to be provided here.
HUD

Versions

2.4.0

@atalman atalman added this to the 2.4.0 milestone Jun 11, 2024
@atalman
Copy link
Contributor Author

atalman commented Jun 11, 2024

Link to landed trunk PR (if applicable):

  • NA

Link to release branch PR:

Criteria Category:

  • Release only changes, temp changes to build triton from pin rather then branch

@atalman merged

@atalman atalman pinned this issue Jun 11, 2024
@malfet malfet added the oncall: releng In support of CI and Release Engineering label Jun 11, 2024
@atalman
Copy link
Contributor Author

atalman commented Jun 11, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@atalman
Copy link
Contributor Author

atalman commented Jun 12, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@atalman
Copy link
Contributor Author

atalman commented Jun 12, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Reverted on main

@malfet merged

@atalman
Copy link
Contributor Author

atalman commented Jun 12, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Reverted on main

@malfet merged

@atalman
Copy link
Contributor Author

atalman commented Jun 12, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Reverted on main

@malfet merged

@zhuhaozhe
Copy link
Contributor

zhuhaozhe commented Jun 13, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@zou3519
Copy link
Contributor

zou3519 commented Jun 13, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • (1) Fixes to regressions. In 2.4, we started spamming warnings if someone used pybind'ed functions with torch.compile. There were no such warnings in 2.3. This PR adjusts the warnings to be less spammy.

@atalman merged

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 13, 2024
@etaf
Copy link
Collaborator

etaf commented Jun 13, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@zou3519
Copy link
Contributor

zou3519 commented Jun 13, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • (3) Critical fixes to new features. In 2.4 we are releasing a new torch.library.custom_op API. This PR fixes a critical bug that the API did not compose with FSDP and other distributed APIs.

@atalman merged

@chunyuan-w
Copy link
Collaborator

chunyuan-w commented Jun 14, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@etaf
Copy link
Collaborator

etaf commented Jun 14, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@wanchaol
Copy link
Contributor

wanchaol commented Jun 14, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fixes for compile time regression

@atalman merged

@wanchaol
Copy link
Contributor

wanchaol commented Jun 14, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fixes for silent correctness

@atalman merged

@drisspg drisspg unpinned this issue Jun 14, 2024
@zou3519 zou3519 pinned this issue Jun 14, 2024
@Xia-Weiwen
Copy link
Collaborator

Xia-Weiwen commented Jun 15, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:


@atalman merged

@xuhancn
Copy link
Collaborator

xuhancn commented Jun 17, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fixes for PyTorch Windows cpp_extension. If user use cpp_extension to build extension and its code contains VEC, it will occur sleef dependency issue.

Hi @xuhancn this looks like feature work to enable inductor on PyTorch Windows, however for release 2.4 we don't support this yet.
Do you have any tests with it ?

@zou3519
Copy link
Contributor

zou3519 commented Jun 17, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • (1) Regression (from 2.3.x) and (2) Critical fixes for: silent incorrectness

@atalman merged

@lw
Copy link
Contributor

lw commented Jun 17, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Fixes to regressions against the most recent minor release (changes only logging, low risk)

@atalman merged

@clee2000
Copy link
Contributor

clee2000 commented Jun 17, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Release - needed to fix executorch release CI

@clee2000 merged

@huydhn huydhn added the release tracker Add this label to release tracker issues label Jun 17, 2024
pytorchmergebot pushed a commit that referenced this issue Jun 18, 2024
This extends the capacity of the cherry-pick bot to automatically update the tracker issue with the information.  For this to work, the tracker issue needs to be an open one with a `release tracker` label, i.e. #128436.  The version from the release branch, i.e. `release/2.4`, will be match with the title of the tracker issue, i.e. `[v.2.4.0] Release Tracker` or `[v.2.4.1] Release Tracker`

### Testing

`python cherry_pick.py --onto-branch release/2.4 --classification release --fixes "DEBUG DEBUG" --github-actor huydhn 128718`

* On the PR #128718 (comment)
* On the tracker issue #128436 (comment)

Pull Request resolved: #128924
Approved by: https://github.com/atalman
@pytorchbot
Copy link
Collaborator

pytorchbot commented Jun 19, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:
CI signal fix for torchbench


@atalman merged

@xuhancn
Copy link
Collaborator

xuhancn commented Jun 19, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fixes for PyTorch Windows cpp_extension. If user use cpp_extension to build extension and its code contains VEC, it will occur sleef dependency issue.

Hi @xuhancn this looks like feature work to enable inductor on PyTorch Windows, however for release 2.4 we don't support this yet. Do you have any tests with it ?

pytorch 2.4 not support Windows Inductor, but pytorch 2.4 enabled Windows x86 CPU vectorization(#118980). It will add sleef as dependency on cpp_extension. So, We need to cherry-pick this PR.

@atalman
Copy link
Contributor Author

atalman commented Jun 20, 2024

@pytorchbot
Copy link
Collaborator

pytorchbot commented Jun 20, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:
Critical - Build py3.12 and add Triton dependency for ROCm


@atalman merged

@atalman
Copy link
Contributor Author

atalman commented Jun 21, 2024

Link to landed trunk PR (if applicable):

  • N/A (will be added later)

Link to release branch PR:

Criteria Category:

  • Release only change to fix triton rocm compilation. This change will be added to trunk later with moving of rocm pin in main

@atalman merged

@zou3519
Copy link
Contributor

zou3519 commented Jun 21, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Documentation improvements

@huydhn merged

@zou3519
Copy link
Contributor

zou3519 commented Jun 21, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Documentation improvements -- we moved the custom ops docs and need to update the error messages.

@huydhn merged

@fegin
Copy link
Contributor

fegin commented Jun 21, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fixes to crashes for loading distributed optimizer state_dict with shared parameters. The bug is introduced in 2.4.

@fegin
Copy link
Contributor

fegin commented Jun 21, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fixes to crashes when loading HSDP1 + full state_dict with rank0 broadcasting feature.

@jingxu10
Copy link
Collaborator

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

Documentation improvements - Add documentation for XPU support to PyTorch

@atalman
Copy link
Contributor Author

atalman commented Jun 21, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Release conda build required change

@atalman merged

@etaf
Copy link
Collaborator

etaf commented Jun 24, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Important Intel GPU Inductor performance improvement for release 2.4.

@xuhancn
Copy link
Collaborator

xuhancn commented Jun 25, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • Critical fixes for PyTorch Windows cpp_extension. If user use cpp_extension to build extension and its code contains VEC, it will occur sleef dependency issue.

Hi @xuhancn this looks like feature work to enable inductor on PyTorch Windows, however for release 2.4 we don't support this yet. Do you have any tests with it ?

Hi @malfet , @atalman , I have added some test case to proof it is a cpp_extension bug fixing: #128811 (comment) . Please don't forget to merge this PR.

@mikaylagawarecki
Copy link
Contributor

mikaylagawarecki commented Jun 25, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:

  • 129244 and 129251 are critical fixes to torch.serialization.add_safe_globals added in 2.4 that caused the API to not work with weights_only for a large proportion of cases
  • 129239 adds a warning, 129396 is a documentation improvement

@pytorchbot
Copy link
Collaborator

pytorchbot commented Jun 25, 2024

Link to landed trunk PR (if applicable):

Link to release branch PR:

Criteria Category:
Critical -

There is no associated issue. This was raised on slack from @mxz297 due to memory access faults using TunableOp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: releng In support of CI and Release Engineering release tracker Add this label to release tracker issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: No status
Development

No branches or pull requests