Skip to content

per-token KL penalty from the SFT model while doing the PPO training #3354

Unanswered
MXuer asked this question in Q&A
Discussion options

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
1 participant
Converted from issue

This discussion was converted from issue #2608 on June 09, 2023 11:46.