Custom MT5 model ONNX export not able to run #1906

JamesBowerXanda · 2024-06-14T10:39:33Z

System Info

Python == 3.10.14

optimum == 1.20.0
transformers == 4.41.2

Who can help?

@michaelbenayoun

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

The issue I am having is exporting the mt5-small model to ONNX format. When I do this it is unable to run using the sample code. I am actually trying to do this with a private model I fine-tuned but have reproduced the issue with the public model,

Below is the script I use to export the model. It pretty much just calls optimum-cli export onnx and optimum-cli onnxruntime quantize. The code snippet also uses a hugging face token since I wrote it for a private repository but you can remove that.

from optimum.onnxruntime import ORTModelForSeq2SeqLM, ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig
from transformers import AutoTokenizer
from huggingface_hub import login
import argparse
import logging
import os

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, default="google/mt5-small", help="Hugging Face model id or path")
    parser.add_argument("--hf_token", type=str, default=None, help="Hugging Face API token")
    parser.add_argument("--quantize", type=bool, default=False, help="Quantize the model")
    parser.add_argument("--full_precision_output_dir", type=str, default="onnx_model", help="Directory to save full precision model")
    parser.add_argument("--quantized_output_dir", type=str, default="onnx_model_quantized", help="Directory to save quantized model")
    return parser.parse_args()


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    logging.info("Parsing arguments...\n")
    args = parse_args()
    model = args.model
    hf_token = args.hf_token
    quantize = args.quantize
    full_precision_output_dir = args.full_precision_output_dir
    quantized_output_dir = args.quantized_output_dir
    
    logging.info("Logging in to Hugging Face...\n")
    if args.hf_token:
        os.environ["HF_TOKEN"] = hf_token
    login(token=hf_token)

    logging.info("Checking if ONNX model directory exists...\n")
    if os.path.exists(full_precision_output_dir):
        logging.info("ONNX model directory already exists. Deleting it...\n")
        os.system(f"rm -r {full_precision_output_dir}")

    logging.info("Exporting model to ONNX...\n")
    cmd = f"""optimum-cli export onnx --model {model} {full_precision_output_dir}"""
    logging.info(f"Running command: {cmd}")
    os.system(cmd)

    if quantize:
        logging.info("Quantizing model...\n")

        logging.info("Checking if quantized model directory exists...\n")
        if os.path.exists(quantized_output_dir):
            logging.info("Quantized model directory already exists. Deleting it...\n")
            os.system(f"rm -r {quantized_output_dir}")

        cmd = f"""optimum-cli onnxruntime quantize --onnx_model {full_precision_output_dir} --avx512 -o {quantized_output_dir}"""
        logging.info(f"Running command: {cmd}")
        os.system(cmd)

I then attempt to do generation using the model with

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model_path = <MODEL_DIR>

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = ORTModelForSeq2SeqLM.from_pretrained(model_path)

inputs = tokenizer("My name is Eustache and I like to", return_tensors="pt")

gen_tokens = model.generate(**inputs)
outputs = tokenizer.batch_decode(gen_tokens)

or

from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import MT5Tokenizer, pipeline

model_path = <MODEL_DIR>

ort_model = ORTModelForSeq2SeqLM.from_pretrained(model_path)
tokenizer = MT5Tokenizer.from_pretrained(model_path)
pipe = pipeline("translation_en_to_no", model=ort_model, tokenizer=tokenizer)

pipe("Hello, how are you?")

This causes the error:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: past_key_values.7.encoder.value for the following indices
 index: 3 Got: 85 Expected: 64
 Please fix either the inputs/outputs or the model.

Expected behavior

The quantised versions of the ONNX model should run.

The text was updated successfully, but these errors were encountered:

JamesBowerXanda added the bug Something isn't working label Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom MT5 model ONNX export not able to run #1906

Custom MT5 model ONNX export not able to run #1906

JamesBowerXanda commented Jun 14, 2024

Custom MT5 model ONNX export not able to run #1906

Custom MT5 model ONNX export not able to run #1906

Comments

JamesBowerXanda commented Jun 14, 2024

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior