Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make install insufficient for running llama3-8B-Instruct #484

Open
2 of 4 tasks
fozziethebeat opened this issue May 22, 2024 · 4 comments
Open
2 of 4 tasks

make install insufficient for running llama3-8B-Instruct #484

fozziethebeat opened this issue May 22, 2024 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@fozziethebeat
Copy link

System Info

lorax-launcher-env output:

Target: x86_64-unknown-linux-gnu
Cargo version: 1.74.0
Commit sha: 97ede5207a4eeb5a9a03dea33b0fb472b762496d

cargo version output:

cargo 1.74.0 (ecb9851af 2023-10-18)` 

Model being used: meta-llama/Meta-Llama-3-70B-Instruct

GPUs: 8 A100s on Coreweave (can't get more details since I broke nvidia accidentally).

Cuda is 12.2 I believe.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. Clone Locally
  2. Run make install
  3. Run lorax-launcher --model-id meta-llama/Meta-Llama-3-70B-Instruct --port 8080

Initial failure reports that module dropout_layer_norm can't be found.

From reading the docker instructions, I believe the full installation is something like:

  1. clone locally
  2. cd lorax
  3. make install
  4. cd server
  5. make install-flash-attn
  6. make install-flash-attn-v2
  7. make install-vllm

However when doing this the install-vllm step ran into an issue whereby it expected torch==2.2.1 however make install actually runs pip install torch==2.2.0 which breaks the vllm step.

Expected behavior

The following steps work successfully:

  1. Clone Locally
  2. Run make install
  3. Run lorax-launcher --model-id meta-llama/Meta-Llama-3-70B-Instruct --port 8080

Alternatively, 2 could be something like make install-comprehensive to include the full vllm and flash attention set of dependencies.

@fozziethebeat
Copy link
Author

I'll note that the docker install worked perfectly. I just happen to be testing in an environment where I can't run docker.

@fozziethebeat
Copy link
Author

I successfully got everything working by installing all the low level libraries.

Further, I found that for some Loras it triggers a flow that depends on punica_kernels.sgmv_cutlass_tmp_size in all cases. That required installation via a few added steps:

  1. At root git submodule sync, git submodule update --init
  2. cd server/punica_modules
  3. python setup.py install

After that my rank 256 lora successfully ran.

@tgaddair tgaddair added the documentation Improvements or additions to documentation label May 23, 2024
@magdyksaleh
Copy link
Collaborator

Need to update the docs to reflect the steps you took to get it to work. Are you blocked on anything?

@magdyksaleh magdyksaleh self-assigned this May 23, 2024
@fozziethebeat
Copy link
Author

Not blocked~ But updated docs (or a unified install target) would help a lot.

Now that I figured it out (a lot was from semi-copying from the dockerfile) I unblocked myself, but I imagine others who want to repeat this step will probably stumble. I'll definitely try to avoid re-doing this however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants