Citation for OpenRLHF in relation to the XTuner RLHF code and architecture #770

hijkzzz · 2024-06-14T21:32:50Z

Hi, XTuner Team

Could you please add a citation for the source of the Ray+vLLM-based RLHF architecture - OpenRLHF, such as in the README.md file: https://github.com/InternLM/xtuner?tab=readme-ov-file#%EF%B8%8F-acknowledgement.
We noticed that most RLHF-related code, particularly the Ray RLHF architecture in XTuner, are refactored from OpenRLHF. According to the Apache License 2.0 of OpenRLHF, the original copyright statement must be included.

An example:

XTuner's RLHF solution mainly references OpenRLHF, for which we are grateful.

Related MR:
#736
#764

Thank you

The text was updated successfully, but these errors were encountered:

pppppM · 2024-06-15T06:50:57Z

Hi, @hijkzzz

I’m the maintainer of XTuner. I am communicating with the developers of PR #764 .
The developers will give you a response as soon as possible.

hui-zhao-1 · 2024-06-17T08:20:27Z

Hi, @hijkzzz

Thank you for your interest and feedback on xtuner-rlhf. We have indeed noticed that the design of open-rlhf includes accelerating the RLHF process by utilizing different training and inference engines simultaneously. Projects adopting similar ideas include atorch and colossalchat as well. Besides using the widely adopted separate engine design, our core project goals are twofold. First, we aim to support more RLHF algorithm innovations and explorations beyond the standard PPO by decoupling algorithms and execution in our architecture. Second, we strive to flexibly support various advanced training and inference technologies, including our self-developed training and inference engines. These two aspects are our primary objectives.

Regarding the “refactoring" you mentioned, our open-source version is an iterative evolution based on a long-term internally used RLHF framework which based on Ray and has been in use since May 2023 and was long based on internevo internally. For the convenience of the open-source community, we first released a version supporting Hugging Face. The project does not directly use the code from Open-RLHF, Atorch, or ColossalChat and we can provide more technical details and explanations as needed. In our analysis and evaluations, we referenced test cases from projects like open-RLHF and deepspeed-chat and the selection criteria for these cases were to facilitate better public discussion and analysis.

Currently, our PR is still under review. We will publicly disclose more detailed information in our design, usage instructions, and README documents, fully referencing and analyzing the designs of open-RLHF, colossalchat, atorch and other projects. We will express our gratitude and clearly acknowledge the inspiration and references from all open-source projects in our public documents. Please stay tuned and feel free to discuss any questions with us.

The reference links for the related projects mentioned above are as follows:
colossalchat: https://github.com/hpcaitech/ColossalAI/tree/main/applications/ColossalChat/coati/ray#detach-experience-makers-and-trainers
open-rlhf: https://github.com/OpenLLMAI/OpenRLHF
atorch: https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch

hijkzzz · 2024-06-27T03:57:19Z

Hi, @hijkzzz

Thank you for your interest and feedback on xtuner-rlhf. We have indeed noticed that the design of open-rlhf includes accelerating the RLHF process by utilizing different training and inference engines simultaneously. Projects adopting similar ideas include atorch and colossalchat as well. Besides using the widely adopted separate engine design, our core project goals are twofold. First, we aim to support more RLHF algorithm innovations and explorations beyond the standard PPO by decoupling algorithms and execution in our architecture. Second, we strive to flexibly support various advanced training and inference technologies, including our self-developed training and inference engines. These two aspects are our primary objectives.

Regarding the “refactoring" you mentioned, our open-source version is an iterative evolution based on a long-term internally used RLHF framework which based on Ray and has been in use since May 2023 and was long based on internevo internally. For the convenience of the open-source community, we first released a version supporting Hugging Face. The project does not directly use the code from Open-RLHF, Atorch, or ColossalChat and we can provide more technical details and explanations as needed. In our analysis and evaluations, we referenced test cases from projects like open-RLHF and deepspeed-chat and the selection criteria for these cases were to facilitate better public discussion and analysis.

Currently, our PR is still under review. We will publicly disclose more detailed information in our design, usage instructions, and README documents, fully referencing and analyzing the designs of open-RLHF, colossalchat, atorch and other projects. We will express our gratitude and clearly acknowledge the inspiration and references from all open-source projects in our public documents. Please stay tuned and feel free to discuss any questions with us.

The reference links for the related projects mentioned above are as follows: colossalchat: https://github.com/hpcaitech/ColossalAI/tree/main/applications/ColossalChat/coati/ray#detach-experience-makers-and-trainers open-rlhf: https://github.com/OpenLLMAI/OpenRLHF atorch: https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch

I carefully reviewed your MR, and essentially 70%+ of the core code is copied from OpenRLHF and then refactored, including the Reward Model/Critic Models/Ray Actor Group/vLLM wrapper-related codes, even the eos_indices code.

Some reference links:

https://github.com/InternLM/xtuner/pull/736/files#diff-9584bcc9d6351ecc10ecac3d46771b2f3746e0d633a1df019520555e1c6a827e

https://github.com/InternLM/xtuner/pull/736/files#diff-bcda64dfc69f34ac62ea769913ddfe3113422ab0780777e7c06ee2612c697bcdR128

https://github.com/InternLM/xtuner/pull/736/files#diff-6f9c7e972078d28b603ce55f4d010d42ae619d22fb512bf80ccd83543496112b

https://github.com/InternLM/xtuner/pull/736/files#diff-af5dff56e4d10af30b0f522405d8be5eb87e894d3d3844c3fb3cd5a2d37a5d10

The Ray module of Colossalchat was also developed by OpenLLMAI members .
ATorch also copies the idea from OpenRLHF.
What I would like to express is that we (OpenLLMAI) are the original creators.

Thanks

hui-zhao-1 · 2024-06-27T10:51:35Z

Thank you for your follow-up comments and we would like to further clarify our position with the following repsonse:

Both XTuner and OpenRLHF are open-source projects that adhere to the Apache License 2.0. Therefore, regarding the part of this PR that is suspected of referencing OpenRLHF's code, we proactively added copyright and modification statements to the project three weeks ago in accordance with the agreement's requirements, and expressed clear gratitude for OpenRLHF's contributions in the relevant documentation PR:
We apologize for the inaccuracies in our response on June 17th: indeed, some parts of the PR code were inspired by different open-source projects include OpenRLHF. As mentioned in the first point, all references were clearly marked according to the Apache License 2.0 standards before the reply (the modification timestamp is not accurate due to rebase). If there's anything missing, please let us know.
Lastly, as mentioned in the previous response, we have not claimed that the training and inference separation architecture as our original innovation. We are aware that many projects have used similar methods and have also drawn on different technologies to meet business needs, including algorithm elasticity and loose coupling of the execution layer. If these projects were inspired by your original innovation as you mentioned in the reply, we would express our gratitude.

hijkzzz · 2024-06-27T12:50:36Z

Thank you for your follow-up comments and we would like to further clarify our position with the following repsonse:

Both XTuner and OpenRLHF are open-source projects that adhere to the Apache License 2.0. Therefore, regarding the part of this PR that is suspected of referencing OpenRLHF's code, we proactively added copyright and modification statements to the project three weeks ago in accordance with the agreement's requirements, and expressed clear gratitude for OpenRLHF's contributions in the relevant documentation PR:

We apologize for the inaccuracies in our response on June 17th: indeed, some parts of the PR code were inspired by different open-source projects include OpenRLHF. As mentioned in the first point, all references were clearly marked according to the Apache License 2.0 standards before the reply (the modification timestamp is not accurate due to rebase). If there's anything missing, please let us know.

Lastly, as mentioned in the previous response, we have not claimed that the training and inference separation architecture as our original innovation. We are aware that many projects have used similar methods and have also drawn on different technologies to meet business needs, including algorithm elasticity and loose coupling of the execution layer. If these projects were inspired by your original innovation as you mentioned in the reply, we would express our gratitude.

I think you should double-check. There are a lot of instances of copy->refactor/paste code that do not cite OpenRLHF, such as:

init_process_group in
https://github.com/InternLM/xtuner/pull/736/files#diff-6f9c7e972078d28b603ce55f4d010d42ae619d22fb512bf80ccd83543496112b

logprobs_from_logits, -num_actions (The variable name is not even changed) in
https://github.com/InternLM/xtuner/pull/736/files#diff-065ca23363998bb9c4573b59f4f180022537aceda3f2c86eb35466133e648fad

Ray net utils
https://github.com/InternLM/xtuner/pull/736/files#diff-a1347e82b229542974a6d29f83c8347ccbc1105d687787ae56243b9216594aa8

Ray actor group
https://github.com/InternLM/xtuner/pull/736/files#diff-8b26aef1c03d548cf849c7dd813cbcfaafe321e4b6b814f67564ac164f8635dc

Ray utils
https://github.com/InternLM/xtuner/pull/736/files#diff-8359acc86af50071a076dfdbc9c506704c2e2ff456bc1fd4558bf09344e50c26

And the policy_model_server.py also refactored from OpenRLHF
xtuner/rlhf/model_server/policy_model_server.py

The same reward clip trick, reward computing logic of OpenRLHF
https://github.com/InternLM/xtuner/pull/736/files#diff-bcda64dfc69f34ac62ea769913ddfe3113422ab0780777e7c06ee2612c697bcd

You also refactored out get_tokenizer function
https://github.com/InternLM/xtuner/pull/736/files#diff-14d4afd16a8b8f6f8447bd2e55a9836e029c51111a8fdaddaaf4a8cd1820b7c4

Overall, Xtuner RLHF is basically a refactored based on OpenRLHF.

hijkzzz · 2024-06-27T12:51:51Z

Such large-scale copying and refactor without citing the copyright of OpenRLHF goes against the spirit of open source and the Apache license. Especially since it was previously stated that no code of Xtuner RLHF from OpenRLHF was used, and it has been used internally since 2023. This is lying.

pppppM · 2024-06-27T17:45:06Z

Hi, @hijkzzz

I am the maintainer of XTuner.

Thanks a lot for your feedback on #736 #764 . I have carefully compared #736 and the OpenRLHF's code today, and recognize that there indeed are many highly similar parts, yet the source project was not acknowledged.

Firstly I would like to offer my sincere apologies to you, as this PR has brought trouble to you and the developers of OpenRLHF.

XTuner aims to provide an efficient and easy-to-use toolkit for LLM finetuning, and welcomes contributions from internal teams and the community.

Some first-time contributors to open-source projects did not follow the license and made such a mistake. I have discussed with the developers of this PR, and we agreed to suspend the development of this PR and start a strict internal review. PRs that are not compliant with the Apache 2.0 license will definitely not be merged.

XTuner benefits from excellent open-source projects like DeepSpeed and OpenRLHF, and we will also make our efforts to create a healthier open-source ecosystem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Citation for OpenRLHF in relation to the XTuner RLHF code and architecture #770

Citation for OpenRLHF in relation to the XTuner RLHF code and architecture #770

hijkzzz commented Jun 14, 2024 •

edited

Loading

pppppM commented Jun 15, 2024

hui-zhao-1 commented Jun 17, 2024 •

edited

Loading

hijkzzz commented Jun 27, 2024 •

edited

Loading

hui-zhao-1 commented Jun 27, 2024 •

edited

Loading

hijkzzz commented Jun 27, 2024 •

edited

Loading

hijkzzz commented Jun 27, 2024 •

edited

Loading

pppppM commented Jun 27, 2024 •

edited

Loading

Citation for OpenRLHF in relation to the XTuner RLHF code and architecture #770

Citation for OpenRLHF in relation to the XTuner RLHF code and architecture #770

Comments

hijkzzz commented Jun 14, 2024 • edited Loading

pppppM commented Jun 15, 2024

hui-zhao-1 commented Jun 17, 2024 • edited Loading

hijkzzz commented Jun 27, 2024 • edited Loading

hui-zhao-1 commented Jun 27, 2024 • edited Loading

hijkzzz commented Jun 27, 2024 • edited Loading

hijkzzz commented Jun 27, 2024 • edited Loading

pppppM commented Jun 27, 2024 • edited Loading

hijkzzz commented Jun 14, 2024 •

edited

Loading

hui-zhao-1 commented Jun 17, 2024 •

edited

Loading

hijkzzz commented Jun 27, 2024 •

edited

Loading

hui-zhao-1 commented Jun 27, 2024 •

edited

Loading

hijkzzz commented Jun 27, 2024 •

edited

Loading

hijkzzz commented Jun 27, 2024 •

edited

Loading

pppppM commented Jun 27, 2024 •

edited

Loading