You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
The issue I am having is exporting the mt5-small model to ONNX format. When I do this it is unable to run using the sample code. I am actually trying to do this with a private model I fine-tuned but have reproduced the issue with the public model,
Below is the script I use to export the model. It pretty much just calls optimum-cli export onnx and optimum-cli onnxruntime quantize. The code snippet also uses a hugging face token since I wrote it for a private repository but you can remove that.
fromoptimum.onnxruntimeimportORTModelForSeq2SeqLM, ORTQuantizerfromoptimum.onnxruntime.configurationimportAutoQuantizationConfigfromtransformersimportAutoTokenizerfromhuggingface_hubimportloginimportargparseimportloggingimportosdefparse_args():
parser=argparse.ArgumentParser()
parser.add_argument("--model", type=str, default="google/mt5-small", help="Hugging Face model id or path")
parser.add_argument("--hf_token", type=str, default=None, help="Hugging Face API token")
parser.add_argument("--quantize", type=bool, default=False, help="Quantize the model")
parser.add_argument("--full_precision_output_dir", type=str, default="onnx_model", help="Directory to save full precision model")
parser.add_argument("--quantized_output_dir", type=str, default="onnx_model_quantized", help="Directory to save quantized model")
returnparser.parse_args()
if__name__=="__main__":
logging.basicConfig(level=logging.INFO)
logging.info("Parsing arguments...\n")
args=parse_args()
model=args.modelhf_token=args.hf_tokenquantize=args.quantizefull_precision_output_dir=args.full_precision_output_dirquantized_output_dir=args.quantized_output_dirlogging.info("Logging in to Hugging Face...\n")
ifargs.hf_token:
os.environ["HF_TOKEN"] =hf_tokenlogin(token=hf_token)
logging.info("Checking if ONNX model directory exists...\n")
ifos.path.exists(full_precision_output_dir):
logging.info("ONNX model directory already exists. Deleting it...\n")
os.system(f"rm -r {full_precision_output_dir}")
logging.info("Exporting model to ONNX...\n")
cmd=f"""optimum-cli export onnx --model {model}{full_precision_output_dir}"""logging.info(f"Running command: {cmd}")
os.system(cmd)
ifquantize:
logging.info("Quantizing model...\n")
logging.info("Checking if quantized model directory exists...\n")
ifos.path.exists(quantized_output_dir):
logging.info("Quantized model directory already exists. Deleting it...\n")
os.system(f"rm -r {quantized_output_dir}")
cmd=f"""optimum-cli onnxruntime quantize --onnx_model {full_precision_output_dir} --avx512 -o {quantized_output_dir}"""logging.info(f"Running command: {cmd}")
os.system(cmd)
I then attempt to do generation using the model with
fromtransformersimportAutoTokenizerfromoptimum.onnxruntimeimportORTModelForSeq2SeqLMmodel_path=<MODEL_DIR>tokenizer=AutoTokenizer.from_pretrained(model_path)
model=ORTModelForSeq2SeqLM.from_pretrained(model_path)
inputs=tokenizer("My name is Eustache and I like to", return_tensors="pt")
gen_tokens=model.generate(**inputs)
outputs=tokenizer.batch_decode(gen_tokens)
or
fromoptimum.onnxruntimeimportORTModelForSeq2SeqLMfromtransformersimportMT5Tokenizer, pipelinemodel_path=<MODEL_DIR>ort_model=ORTModelForSeq2SeqLM.from_pretrained(model_path)
tokenizer=MT5Tokenizer.from_pretrained(model_path)
pipe=pipeline("translation_en_to_no", model=ort_model, tokenizer=tokenizer)
pipe("Hello, how are you?")
This causes the error:
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: past_key_values.7.encoder.value for the following indices
index: 3 Got: 85 Expected: 64
Please fix either the inputs/outputs or the model.
Expected behavior
The quantised versions of the ONNX model should run.
The text was updated successfully, but these errors were encountered:
System Info
Who can help?
@michaelbenayoun
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
The issue I am having is exporting the mt5-small model to ONNX format. When I do this it is unable to run using the sample code. I am actually trying to do this with a private model I fine-tuned but have reproduced the issue with the public model,
Below is the script I use to export the model. It pretty much just calls
optimum-cli export onnx
andoptimum-cli onnxruntime quantize
. The code snippet also uses a hugging face token since I wrote it for a private repository but you can remove that.I then attempt to do generation using the model with
or
This causes the error:
Expected behavior
The quantised versions of the ONNX model should run.
The text was updated successfully, but these errors were encountered: