Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RMSNorm的不同实现方式 #1240

Open
1 of 2 tasks
trundleyrg opened this issue May 28, 2024 · 0 comments
Open
1 of 2 tasks

RMSNorm的不同实现方式 #1240

trundleyrg opened this issue May 28, 2024 · 0 comments
Assignees

Comments

@trundleyrg
Copy link

trundleyrg commented May 28, 2024

System Info / 系統信息

python版本:2.12

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

class RMSNorm(torch.nn.Module):
    def __init__(self, normalized_shape, eps=1e-5, device=None, dtype=None, **kwargs):
        super().__init__()
        self.weight = torch.nn.Parameter(torch.empty(normalized_shape, device=device, dtype=dtype))
        self.eps = eps

    def forward(self, hidden_states: torch.Tensor):
        input_dtype = hidden_states.dtype
        variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
        hidden_states = hidden_states * torch.rsqrt(variance + self.eps)

        return (self.weight * hidden_states).to(input_dtype)

chatglm的RMSNorm实现中,weight用的是torch.empty的随机初始化。而llama的RMSNorm实现中,用的是torch.ones全一初始化。请问,chatglm用torch.empty是有随机初始化缩放系数的考虑嘛?

Expected behavior / 期待表现

能介绍一下两种实现方式的优劣吗?

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants