GitHub - amazon-science/mezo_svrg: Code the ICML 2024 paper: "Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models"

MeZO-SVRG: Variance-Reduced Zero-Order Methods for fine-tuning LLMs

This repository implements the Memory-Efficient Zeroth-Order Stochastic Variance-Reduced Gradient (MeZO-SVRG) algorithm for fine-tuning pre-trained hugging face LMs. As baselines we also implement Memory-efficient ZO Optimizer (MeZO) and first-order SGD (FO-SGD). The repository is written in PyTorch and leverages the Pytorch Lightning framework.

Installation

To install the relevant python environment use the command

  conda create --name zo_opt python=3.9
  conda activate zo_opt
  python -m pip install -r requirements.txt

File Overview

This repository implements the MeZO-SVRG algorithm and enables fine-tuning on a range on language models using the GLUE benchmark dataset. To run experiments, execute the 'finetune_llm.sh' bash script.

The script supports the following models:

'distilbert-base-cased'
'roberta-large'
'gpt2-xl'
'facebook/opt-2.7b'
'facebook/opt-6.7b'

The script supports the following GLUE tasks:

MNLI
QNLI
SST-2
CoLA

Indicate the fine-tuning algorithm by passing one of the following {'FO', 'ZO', 'ZOSVRG'}. The exact hyperparameter settings used to generate the tables/figures in the paper are provided in the Appendix.

Citation

Please consider citing our paper if you use our code:

@misc{gautam2024variancereduced,
      title={Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models}, 
      author={Tanmay Gautam and Youngsuk Park and Hao Zhou and Parameswaran Raman and Wooseok Ha},
      year={2024},
      eprint={2404.08080},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
mezo_src		mezo_src
models		models
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
THIRD_PARTY_LICENSES		THIRD_PARTY_LICENSES
convergence_plots.py		convergence_plots.py
extract_model_info.py		extract_model_info.py
finetune_llm.py		finetune_llm.py
finetune_llm.sh		finetune_llm.sh
finetune_llm_full.sh		finetune_llm_full.sh
finetune_llm_hpsearch.sh		finetune_llm_hpsearch.sh
memory_consumption_plot.py		memory_consumption_plot.py
mezo_dataset.py		mezo_dataset.py
plot_comparison.py		plot_comparison.py
plot_results.py		plot_results.py
plot_results.sh		plot_results.sh
requirements.txt		requirements.txt
run_mnli_all.sh		run_mnli_all.sh
run_mnli_mezo.sh		run_mnli_mezo.sh
run_mnli_zosvrg_full.sh		run_mnli_zosvrg_full.sh
run_mnli_zosvrg_partial.sh		run_mnli_zosvrg_partial.sh
run_slurm_mnli.sh		run_slurm_mnli.sh
train_ls.py		train_ls.py
train_mlp.py		train_mlp.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeZO-SVRG: Variance-Reduced Zero-Order Methods for fine-tuning LLMs

Installation

File Overview

Citation

Security

License

About

Releases

Packages

Contributors 2

Languages

License

amazon-science/mezo_svrg

Folders and files

Latest commit

History

Repository files navigation

MeZO-SVRG: Variance-Reduced Zero-Order Methods for fine-tuning LLMs

Installation

File Overview

Citation

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages