onnx export for cuda does not work #1892

geraldstanje · 2024-05-17T05:04:34Z

System Info

$ lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

$ python --version
Python 3.10.10

$ pip list
Package                   Version
------------------------- --------------
absl-py                   2.1.0
aiohttp                   3.9.5
aiosignal                 1.3.1
annotated-types           0.6.0
anyio                     4.3.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.2.0
awscrt                    0.20.9
Babel                     2.14.0
backoff                   2.2.1
beautifulsoup4            4.12.3
bleach                    6.1.0
boto3                     1.34.96
botocore                  1.34.96
cachetools                5.3.3
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
coloredlogs               15.0.1
comm                      0.2.2
contourpy                 1.2.1
cycler                    0.12.1
datasets                  2.19.1
debugpy                   1.8.1
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.8
evaluate                  0.4.2
exceptiongroup            1.2.1
executing                 2.0.1
fastapi                   0.110.0
fastjsonschema            2.19.1
filelock                  3.14.0
fire                      0.6.0
flatbuffers               24.3.25
fonttools                 4.51.0
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2024.3.1
google-auth               2.29.0
google-auth-oauthlib      1.2.0
grpcio                    1.63.0
h11                       0.14.0
huggingface-hub           0.23.0
humanfriendly             10.0
idna                      3.7
ipykernel                 6.26.0
ipython                   8.17.2
ipywidgets                8.1.1
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.3
jmespath                  1.0.1
joblib                    1.4.0
json5                     0.9.25
jsonpointer               2.4
jsonschema                4.22.0
jsonschema-specifications 2023.12.1
jupyter_client            8.6.1
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.5
jupyter_server            2.14.0
jupyter_server_terminals  0.5.3
jupyterlab                4.0.6
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.1
jupyterlab_widgets        3.0.10
kiwisolver                1.4.5
lightning                 2.2.4
lightning-cloud           0.5.64
lightning-remote-profiler 0.0.6
lightning_sdk             0.1.7
lightning-utilities       0.10.1
litdata                   0.2.2
Markdown                  3.6
markdown-it-py            3.0.0
MarkupSafe                2.1.5
matplotlib                3.8.2
matplotlib-inline         0.1.7
mdurl                     0.1.2
mistune                   3.0.2
mpmath                    1.3.0
multidict                 6.0.5
multiprocess              0.70.16
nbclient                  0.10.0
nbconvert                 7.16.4
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.3
ninja                     1.11.1.1
notebook_shim             0.2.4
numpy                     1.26.2
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.19.3
nvidia-nvjitlink-cu12     12.4.127
nvidia-nvtx-cu12          12.1.105
oauthlib                  3.2.2
objprint                  0.2.3
onnx                      1.16.0
onnxconverter-common      1.14.0
onnxruntime-gpu           1.17.1
optimum                   1.19.2
overrides                 7.7.0
packaging                 24.0
pandas                    2.1.4
pandocfilters             1.5.1
parso                     0.8.4
pexpect                   4.9.0
pillow                    10.3.0
pip                       24.0
platformdirs              4.2.1
prometheus_client         0.20.0
prompt-toolkit            3.0.43
protobuf                  3.20.2
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   16.0.0
pyarrow-hotfix            0.6
pyasn1                    0.6.0
pyasn1_modules            0.4.0
pycparser                 2.22
pydantic                  2.7.1
pydantic_core             2.18.2
Pygments                  2.17.2
PyJWT                     2.8.0
pyparsing                 3.1.2
python-dateutil           2.9.0.post0
python-json-logger        2.0.7
python-multipart          0.0.9
pytorch-lightning         2.2.4
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     26.0.3
referencing               0.35.1
regex                     2024.5.15
requests                  2.31.0
requests-oauthlib         2.0.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.7.1
rpds-py                   0.18.0
rsa                       4.9
s3transfer                0.10.1
safetensors               0.4.3
scikit-learn              1.3.2
scipy                     1.11.4
Send2Trash                1.8.3
sentence-transformers     2.7.0
sentencepiece             0.2.0
setfit                    1.0.3
setuptools                68.2.2
simple-term-menu          1.6.4
six                       1.16.0
skl2onnx                  1.16.0
sniffio                   1.3.1
soupsieve                 2.5
stack-data                0.6.3
starlette                 0.36.3
sympy                     1.12
tensorboard               2.15.1
tensorboard-data-server   0.7.2
termcolor                 2.4.0
terminado                 0.18.1
threadpoolctl             3.5.0
tinycss2                  1.3.0
tokenizers                0.19.1
tomli                     2.0.1
torch                     2.2.1+cu121
torchmetrics              1.3.1
torchvision               0.17.1+cu121
tornado                   6.4
tqdm                      4.66.2
traitlets                 5.14.3
transformers              4.40.2
triton                    2.2.0
types-python-dateutil     2.9.0.20240316
typing_extensions         4.11.0
tzdata                    2024.1
uri-template              1.3.0
urllib3                   2.2.1
uvicorn                   0.29.0
viztracer                 0.16.2
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.8.0
Werkzeug                  3.0.2
wheel                     0.41.2
widgetsnbextension        4.0.10
xxhash                    3.4.1
yarl                      1.9.4

$ nvidia-smi
Fri May 17 04:59:05 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
| N/A   26C    P8               8W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Who can help?

@michaelbenayoun
@JingyaHuang
@echarlaix
@simoninithomas
@amyeroberts

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

hi,

i trained a setfit model with a small dataset and tried to export it to onnx for cuda. it seems the conversion fails. can someone show me how to export it?

which huggingface transformer did i use and train?

sentence-transformers/all-MiniLM-L6-v2

see: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

here all infos regarding the trained model:

from setfit import SetFitModel

model = SetFitModel.from_pretrained("setfit-test-model")
print("model.model_head:", model.model_head)
print("model.model_body:", model.model_body)
print("model.model_body[0].auto_model:", model.model_body[0].auto_model)
print("model.model_body[0].auto_model.config:", model.model_body[0].auto_model.config)

output:

model.model_head: LogisticRegression()
model.model_body: SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
model.model_body[0].auto_model: BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 384, padding_idx=0)
    (position_embeddings): Embedding(512, 384)
    (token_type_embeddings): Embedding(2, 384)
    (LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-5): 6 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=384, out_features=384, bias=True)
            (key): Linear(in_features=384, out_features=384, bias=True)
            (value): Linear(in_features=384, out_features=384, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=384, out_features=384, bias=True)
            (LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): BertIntermediate(
          (dense): Linear(in_features=384, out_features=1536, bias=True)
          (intermediate_act_fn): GELUActivation()
        )
        (output): BertOutput(
          (dense): Linear(in_features=1536, out_features=384, bias=True)
          (LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
  )
  (pooler): BertPooler(
    (dense): Linear(in_features=384, out_features=384, bias=True)
    (activation): Tanh()
  )
)
model.model_body[0].auto_model.config: BertConfig {
  "_name_or_path": "setfit-test-model",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "torch_dtype": "float32",
  "transformers_version": "4.40.2",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

i tried to export a onnx model for cuda - which seems not to work:

optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
2024-05-17 04:46:37.141067101 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-17 04:46:37.145604395 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:46:37.145622795 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
  warnings.warn(
Optimizing model...
2024-05-17 04:46:38.933947965 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:46:38.933972326 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating models in subprocesses...

Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-17 04:46:43.989705868 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:46:43.989729585 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
        -[✓] ONNX model output names match reference model (sentence_embedding, token_embeddings)
        - Validating ONNX Model output "token_embeddings":
                -[✓] (2, 16, 384) matches (2, 16, 384)
                -[x] values not close enough, max diff: 2.658904552459717 (atol: 1e-05)
        - Validating ONNX Model output "sentence_embedding":
                -[✓] (2, 384) matches (2, 384)
                -[x] values not close enough, max diff: 0.00038395076990127563 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.658904552459717
- sentence_embedding: max diff = 0.00038395076990127563.
 The exported model was saved at: setfit_auto_opt_O4
⚡ ~ 
⚡ ~ 
⚡ ~ ls
checkpoints     setfit-test-model   setfit_onnx       sklearn_model.onnx  training.py
export_onnx.py  setfit_auto_opt_O4  setfit_onnx.onnx  train_dataset.csv   validate_onnx_model.py
⚡ ~ optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
⚡ ~ rm -rf setfit_auto_opt_O4
⚡ ~ optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
2024-05-17 04:47:06.392363834 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-17 04:47:06.396077224 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:47:06.396096399 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
  warnings.warn(
Optimizing model...
2024-05-17 04:47:08.066191297 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:47:08.066216170 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.
Validating models in subprocesses...

Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-17 04:47:13.244194104 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 04:47:13.244218108 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
        -[✓] ONNX model output names match reference model (sentence_embedding, token_embeddings)
        - Validating ONNX Model output "token_embeddings":
                -[✓] (2, 16, 384) matches (2, 16, 384)
                -[x] values not close enough, max diff: 2.7059507369995117 (atol: 1e-05)
        - Validating ONNX Model output "sentence_embedding":
                -[✓] (2, 384) matches (2, 384)
                -[x] values not close enough, max diff: 0.0004076659679412842 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.7059507369995117
- sentence_embedding: max diff = 0.0004076659679412842.
 The exported model was saved at: setfit_auto_opt_O4

here to validate the generated onnx model:

import onnxruntime

# Load the ONNX model
onnx_model_path = 'setfit_auto_opt_O4/model.onnx'
session = onnxruntime.InferenceSession(onnx_model_path)

# Check if CUDA execution provider is available
providers = session.get_providers()
print(providers)

the output:

['CPUExecutionProvider']

Expected behavior

setfit can be exported to onnx for cuda

The text was updated successfully, but these errors were encountered:

geraldstanje · 2024-05-17T16:16:37Z

i tried to install accelerate - same issue:

$ pip list
Package                   Version
------------------------- --------------
absl-py                   2.1.0
accelerate                0.30.1
aiohttp                   3.9.5
aiosignal                 1.3.1
annotated-types           0.6.0
anyio                     4.3.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.2.0
awscrt                    0.20.9
Babel                     2.14.0
backoff                   2.2.1
beautifulsoup4            4.12.3
bleach                    6.1.0
boto3                     1.34.96
botocore                  1.34.96
cachetools                5.3.3
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
coloredlogs               15.0.1
comm                      0.2.2
contourpy                 1.2.1
cycler                    0.12.1
datasets                  2.19.1
debugpy                   1.8.1
decorator                 5.1.1
defusedxml                0.7.1
dill                      0.3.8
evaluate                  0.4.2
exceptiongroup            1.2.1
executing                 2.0.1
fastapi                   0.110.0
fastjsonschema            2.19.1
filelock                  3.14.0
fire                      0.6.0
flatbuffers               24.3.25
fonttools                 4.51.0
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2024.3.1
google-auth               2.29.0
google-auth-oauthlib      1.2.0
grpcio                    1.63.0
h11                       0.14.0
huggingface-hub           0.23.0
humanfriendly             10.0
idna                      3.7
ipykernel                 6.26.0
ipython                   8.17.2
ipywidgets                8.1.1
isoduration               20.11.0
jedi                      0.19.1
Jinja2                    3.1.3
jmespath                  1.0.1
joblib                    1.4.0
json5                     0.9.25
jsonpointer               2.4
jsonschema                4.22.0
jsonschema-specifications 2023.12.1
jupyter_client            8.6.1
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.5
jupyter_server            2.14.0
jupyter_server_terminals  0.5.3
jupyterlab                4.0.6
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.1
jupyterlab_widgets        3.0.10
kiwisolver                1.4.5
lightning                 2.2.4
lightning-cloud           0.5.64
lightning-remote-profiler 0.0.6
lightning_sdk             0.1.7
lightning-utilities       0.10.1
litdata                   0.2.2
Markdown                  3.6
markdown-it-py            3.0.0
MarkupSafe                2.1.5
matplotlib                3.8.2
matplotlib-inline         0.1.7
mdurl                     0.1.2
mistune                   3.0.2
mpmath                    1.3.0
multidict                 6.0.5
multiprocess              0.70.16
nbclient                  0.10.0
nbconvert                 7.16.4
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.3
ninja                     1.11.1.1
notebook_shim             0.2.4
numpy                     1.26.2
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.19.3
nvidia-nvjitlink-cu12     12.4.127
nvidia-nvtx-cu12          12.1.105
oauthlib                  3.2.2
objprint                  0.2.3
onnx                      1.16.0
onnxconverter-common      1.14.0
onnxruntime-gpu           1.17.1
optimum                   1.19.2
overrides                 7.7.0
packaging                 24.0
pandas                    2.1.4
pandocfilters             1.5.1
parso                     0.8.4
pexpect                   4.9.0
pillow                    10.3.0
pip                       24.0
platformdirs              4.2.1
prometheus_client         0.20.0
prompt-toolkit            3.0.43
protobuf                  3.20.2
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   16.0.0
pyarrow-hotfix            0.6
pyasn1                    0.6.0
pyasn1_modules            0.4.0
pycparser                 2.22
pydantic                  2.7.1
pydantic_core             2.18.2
Pygments                  2.17.2
PyJWT                     2.8.0
pyparsing                 3.1.2
python-dateutil           2.9.0.post0
python-json-logger        2.0.7
python-multipart          0.0.9
pytorch-lightning         2.2.4
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     26.0.3
referencing               0.35.1
regex                     2024.5.15
requests                  2.31.0
requests-oauthlib         2.0.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.7.1
rpds-py                   0.18.0
rsa                       4.9
s3transfer                0.10.1
safetensors               0.4.3
scikit-learn              1.3.2
scipy                     1.11.4
Send2Trash                1.8.3
sentence-transformers     2.7.0
sentencepiece             0.2.0
setfit                    1.0.3
setuptools                68.2.2
simple-term-menu          1.6.4
six                       1.16.0
skl2onnx                  1.16.0
sniffio                   1.3.1
soupsieve                 2.5
stack-data                0.6.3
starlette                 0.36.3
sympy                     1.12
tensorboard               2.15.1
tensorboard-data-server   0.7.2
termcolor                 2.4.0
terminado                 0.18.1
threadpoolctl             3.5.0
tinycss2                  1.3.0
tokenizers                0.19.1
tomli                     2.0.1
torch                     2.2.1+cu121
torchmetrics              1.3.1
torchvision               0.17.1+cu121
tornado                   6.4
tqdm                      4.66.2
traitlets                 5.14.3
transformers              4.40.2
triton                    2.2.0
types-python-dateutil     2.9.0.20240316
typing_extensions         4.11.0
tzdata                    2024.1
uri-template              1.3.0
urllib3                   2.2.1
uvicorn                   0.29.0
viztracer                 0.16.2
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.8.0
Werkzeug                  3.0.2
wheel                     0.41.2
widgetsnbextension        4.0.10
xxhash                    3.4.1
yarl                      1.9.4


$ optimum-cli export onnx --model setfit-test-model --task feature-extraction --optimize O4 --device cuda setfit_auto_opt_O4
Framework not specified. Using pt to export the model.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.

***** Exporting submodel 1/1: SentenceTransformer *****
Using framework PyTorch: 2.2.1+cu121
Overriding 1 configuration item(s)
        - use_cache -> False
2024-05-17 16:12:43.669923443 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 4 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-05-17 16:12:43.674687159 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 16:12:43.674710116 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
Overridding for_gpu=False to for_gpu=True as half precision is available only on GPU.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/optimum/onnxruntime/configuration.py:770: FutureWarning: disable_embed_layer_norm will be deprecated soon, use disable_embed_layer_norm_fusion instead, disable_embed_layer_norm_fusion is set to True.
  warnings.warn(
Optimizing model...
2024-05-17 16:12:45.256902100 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 16:12:45.256924632 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
Configuration saved in setfit_auto_opt_O4/ort_config.json
Optimized model saved at: setfit_auto_opt_O4 (external data format: False; saved all tensor to one file: True)
Post-processing the exported models...
Deduplicating shared (tied) weights...
Validating models in subprocesses...

Validating ONNX model setfit_auto_opt_O4/model.onnx...
2024-05-17 16:12:51.207877203 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-05-17 16:12:51.207902300 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
        -[✓] ONNX model output names match reference model (token_embeddings, sentence_embedding)
        - Validating ONNX Model output "token_embeddings":
                -[✓] (2, 16, 384) matches (2, 16, 384)
                -[x] values not close enough, max diff: 2.0768027305603027 (atol: 1e-05)
        - Validating ONNX Model output "sentence_embedding":
                -[✓] (2, 384) matches (2, 384)
                -[x] values not close enough, max diff: 0.0004524439573287964 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- token_embeddings: max diff = 2.0768027305603027
- sentence_embedding: max diff = 0.0004524439573287964.
 The exported model was saved at: setfit_auto_opt_O4

amyeroberts · 2024-05-17T16:18:15Z

Hi @geraldstanje, thanks for opening an issue! Transferring to the optimum library as they handle onnx exports

geraldstanje · 2024-05-17T16:33:20Z

@amyeroberts are you sure that the right place?

geraldstanje · 2024-05-21T04:11:56Z

here the training of the setfit model:

import pandas as pd
from datasets import Dataset, DatasetDict
from sklearn.model_selection import train_test_split
from setfit import SetFitModel, Trainer, TrainingArguments, sample_dataset

# Load your CSV files into Pandas DataFrames
df = pd.read_csv("train_dataset.csv")
train_df = df.iloc[:120,:]
test_df = df.iloc[120:,:]
# Perform train-test split

print(train_df.shape, test_df.shape)
train_df, valid_df = train_test_split(train_df, test_size=0.1)

# Rename columns to match expected names and drop unnecessary columns
train_df = train_df[['text', 'categories']].rename(columns={'categories': 'label'}).drop(columns=['__index_level_0__'], errors='ignore')
valid_df = valid_df[['text', 'categories']].rename(columns={'categories': 'label'}).drop(columns=['__index_level_0__'], errors='ignore')
test_df = test_df[['text', 'categories']].rename(columns={'categories': 'label'}).drop(columns=['__index_level_0__'], errors='ignore')

# Convert to Dataset objects
train_data = Dataset.from_pandas(train_df)
valid_data = Dataset.from_pandas(valid_df)
test_data = Dataset.from_pandas(test_df)


dataset = DatasetDict({
    'train': train_data,
    'validation': valid_data,
    'test': test_data
})

print("len(train_df):", len(train_df))
print("len(valid_df):", len(valid_df))
print("len(test_df):", len(test_df))

# Sample the dataset for few-shot learning
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=20)
eval_dataset = dataset["validation"]


# Define your categories and SetFit model
categories = ["aws_iam","access_management", "DOC", "NONE"]
#model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2", labels=categories)
model = SetFitModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2", labels=categories)

# Training arguments
args = TrainingArguments(
    batch_size=16,
    num_epochs=1,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    report_to="none"
)

# Adjust column_mapping based on the printed column names
column_mapping = {
    "text": "sentence",  # Ensure 'text' is the actual column name in your dataset
    "categories": "label"  # Ensure 'categories' is the actual column name in your dataset
}

# Trainer configuration
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    metric="accuracy"
)

# Remaining code for training and evaluation...


# Train the model
trainer.train()

# Evaluate the model on the validation and test datasets
validation_metrics = trainer.evaluate(eval_dataset)
print("Validation Metrics:", validation_metrics)

test_metrics = trainer.evaluate(test_data)
print("Test Metrics:", test_metrics)

# Save and push the model to the Hub (change the model name accordingly)
model.save_pretrained("setfit-test-model")

mfuntowicz · 2024-05-24T11:07:05Z

Hi @geraldstanje I dont think this is the right place aha, would you mind opening this issue in huggingface/optimum repository?

Closing this one here as it's not related to the scope of optimum-nvidia.

geraldstanje · 2024-05-24T13:14:50Z

@mfuntowicz i opened the ticket under huggingface/optimum repo but amyeroberts moved it to here! - please transfer it back and dont close the ticket!

@amyeroberts can you please move it back and reopen it?

amyeroberts · 2024-06-03T16:34:29Z

@geraldstanje Apologies for moving it to the wrong place (as this came up in my github notifications and I was tagged I thought this was under transformers as I have nothing to do with optimum). I don't have permissions to reopen or move this issue from here - could you create a new issue please?

geraldstanje added the bug Something isn't working label May 17, 2024

amyeroberts transferred this issue from huggingface/optimum May 17, 2024

mfuntowicz closed this as completed May 24, 2024

fxmarty transferred this issue from huggingface/optimum-nvidia Jun 4, 2024

fxmarty reopened this Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnx export for cuda does not work #1892

onnx export for cuda does not work #1892

geraldstanje commented May 17, 2024 •

edited

Loading

geraldstanje commented May 17, 2024 •

edited

Loading

amyeroberts commented May 17, 2024

geraldstanje commented May 17, 2024

geraldstanje commented May 21, 2024

mfuntowicz commented May 24, 2024

geraldstanje commented May 24, 2024 •

edited

Loading

amyeroberts commented Jun 3, 2024

onnx export for cuda does not work #1892

onnx export for cuda does not work #1892

Comments

geraldstanje commented May 17, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

geraldstanje commented May 17, 2024 • edited Loading

amyeroberts commented May 17, 2024

geraldstanje commented May 17, 2024

geraldstanje commented May 21, 2024

mfuntowicz commented May 24, 2024

geraldstanje commented May 24, 2024 • edited Loading

amyeroberts commented Jun 3, 2024

geraldstanje commented May 17, 2024 •

edited

Loading

geraldstanje commented May 17, 2024 •

edited

Loading

geraldstanje commented May 24, 2024 •

edited

Loading