Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom MT5 model ONNX export not able to run #1906

Open
2 of 4 tasks
JamesBowerXanda opened this issue Jun 14, 2024 · 0 comments
Open
2 of 4 tasks

Custom MT5 model ONNX export not able to run #1906

JamesBowerXanda opened this issue Jun 14, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@JamesBowerXanda
Copy link

System Info

Python == 3.10.14

optimum == 1.20.0
transformers == 4.41.2

Who can help?

@michaelbenayoun

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

The issue I am having is exporting the mt5-small model to ONNX format. When I do this it is unable to run using the sample code. I am actually trying to do this with a private model I fine-tuned but have reproduced the issue with the public model,

Below is the script I use to export the model. It pretty much just calls optimum-cli export onnx and optimum-cli onnxruntime quantize. The code snippet also uses a hugging face token since I wrote it for a private repository but you can remove that.

from optimum.onnxruntime import ORTModelForSeq2SeqLM, ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig
from transformers import AutoTokenizer
from huggingface_hub import login
import argparse
import logging
import os

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, default="google/mt5-small", help="Hugging Face model id or path")
    parser.add_argument("--hf_token", type=str, default=None, help="Hugging Face API token")
    parser.add_argument("--quantize", type=bool, default=False, help="Quantize the model")
    parser.add_argument("--full_precision_output_dir", type=str, default="onnx_model", help="Directory to save full precision model")
    parser.add_argument("--quantized_output_dir", type=str, default="onnx_model_quantized", help="Directory to save quantized model")
    return parser.parse_args()


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    logging.info("Parsing arguments...\n")
    args = parse_args()
    model = args.model
    hf_token = args.hf_token
    quantize = args.quantize
    full_precision_output_dir = args.full_precision_output_dir
    quantized_output_dir = args.quantized_output_dir
    
    logging.info("Logging in to Hugging Face...\n")
    if args.hf_token:
        os.environ["HF_TOKEN"] = hf_token
    login(token=hf_token)

    logging.info("Checking if ONNX model directory exists...\n")
    if os.path.exists(full_precision_output_dir):
        logging.info("ONNX model directory already exists. Deleting it...\n")
        os.system(f"rm -r {full_precision_output_dir}")

    logging.info("Exporting model to ONNX...\n")
    cmd = f"""optimum-cli export onnx --model {model} {full_precision_output_dir}"""
    logging.info(f"Running command: {cmd}")
    os.system(cmd)

    if quantize:
        logging.info("Quantizing model...\n")

        logging.info("Checking if quantized model directory exists...\n")
        if os.path.exists(quantized_output_dir):
            logging.info("Quantized model directory already exists. Deleting it...\n")
            os.system(f"rm -r {quantized_output_dir}")

        cmd = f"""optimum-cli onnxruntime quantize --onnx_model {full_precision_output_dir} --avx512 -o {quantized_output_dir}"""
        logging.info(f"Running command: {cmd}")
        os.system(cmd)

I then attempt to do generation using the model with

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_path = <MODEL_DIR>

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = ORTModelForSeq2SeqLM.from_pretrained(model_path)

inputs = tokenizer("My name is Eustache and I like to", return_tensors="pt")

gen_tokens = model.generate(**inputs)
outputs = tokenizer.batch_decode(gen_tokens)

or

from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import MT5Tokenizer, pipeline

model_path = <MODEL_DIR>

ort_model = ORTModelForSeq2SeqLM.from_pretrained(model_path)
tokenizer = MT5Tokenizer.from_pretrained(model_path)
pipe = pipeline("translation_en_to_no", model=ort_model, tokenizer=tokenizer)

pipe("Hello, how are you?")

This causes the error:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: past_key_values.7.encoder.value for the following indices
 index: 3 Got: 85 Expected: 64
 Please fix either the inputs/outputs or the model.

Expected behavior

The quantised versions of the ONNX model should run.

@JamesBowerXanda JamesBowerXanda added the bug Something isn't working label Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant