[Feature] Medusa weights conversion #1180

zhyncs · 2024-02-22T07:48:53Z

Motivation

In order to support FasterDecoding/Medusa for LMDeploy, we may need

1. Medusa weights conversion
2. Medusa weights loading
3. Porting FasterDecoding/Medusa Heads code with LMDeploy components and utilities
4. Porting generate_candidates and evaluate_posterior
5. Integrating with LlamaBatch

Before the Chinese New Year, @lzhangzz and I briefly discussed the definitions of weights conversion, loading, head porting, and integration with LlamaBatch.

Using FasterDecoding/medusa-vicuna-13b-v1.3 as an example, here is the details about Medusa weights conversion:

The keys for the FasterDecoding/medusa-vicuna-13b-v1.3 weights are as follows:

['0.0.linear.weight', '0.0.linear.bias', '0.1.weight', '1.0.linear.weight', '1.0.linear.bias', '1.1.weight', '2.0.linear.weight', '2.0.linear.bias', '2.1.weight', '3.0.linear.weight', '3.0.linear.bias', '3.1.weight', '4.0.linear.weight', '4.0.linear.bias', '4.1.weight']

In brief

{medusa_head}.{medusa_layer}.linear.weight
{medusa_head}.{medusa_layer}.linear.bias
{medusa_head}.{medusa_num_layers}.weight

And in this example, medusa_num_heads is 5, medusa_num_layers is 1.
To distinguish from the weights of the base model and to support tensor parallelism, the naming convention will be modified when saving as follows:

medusa.{medusa_head}.{medusa_layer}.linear.{rank}.weight
medusa.{medusa_head}.{medusa_layer}.linear.{rank}.bias
medusa.{medusa_head}.{medusa_num_layers}.{rank}.weight

And they are also saved in workspace/triton_models/weights directory.

The overall code implementation is located at

lmdeploy/lmdeploy/turbomind/deploy/source_model/llama_medusa.py
lmdeploy/lmdeploy/turbomind/deploy/target_model/fp_medusa.py

In the current version, in order to complete the proof of concept (POC), we will initially implement fp16 on the LlamaForCausalLM. Subsequently, we will expand to other types such as fp32, bf16, int8, and so on.

@irexyc @grimoire @lzhangzz @lvhan028 Do you have any suggestions? Thanks.

In addition to weight conversion, we will separately raise issues to detail the subsequent steps. Stay tuned.

Related resources

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

zhyncs · 2024-02-29T02:40:21Z

For more detailed specific progress, please refer to #1213.

zhyncs · 2024-03-01T15:24:05Z

refer to #1231 just close this

zhyncs closed this as completed Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Medusa weights conversion #1180

[Feature] Medusa weights conversion #1180

zhyncs commented Feb 22, 2024

zhyncs commented Feb 29, 2024

zhyncs commented Mar 1, 2024 •

edited

Loading

[Feature] Medusa weights conversion #1180

[Feature] Medusa weights conversion #1180

Comments

zhyncs commented Feb 22, 2024

Motivation

Related resources

Additional context

zhyncs commented Feb 29, 2024

zhyncs commented Mar 1, 2024 • edited Loading

zhyncs commented Mar 1, 2024 •

edited

Loading