Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia_gpu_exporter doesn't work for NVIDIA A10 #115

Open
JoephomChen opened this issue Jul 18, 2023 · 0 comments
Open

nvidia_gpu_exporter doesn't work for NVIDIA A10 #115

JoephomChen opened this issue Jul 18, 2023 · 0 comments

Comments

@JoephomChen
Copy link

Describe the bug
The exporter doesn't work on lab with NVIDIA A10. It cannot collect the GPU information normally.

Console output
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: 2023/07/18 07:23:20.045" query_field_name=timestamp raw_value="2023/07/18 07:23:20.045"
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: 535.54.03" query_field_name=driver_version raw_value=535.54.03
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: [n/a]" query_field_name=vgpu_driver_capability.heterogenous_multivGPU raw_value=[N/A]
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: gpu-5e10b7bc-91f1-640a-e927-963f7f82de44" query_field_name=uuid raw_value=GPU-5e10b7bc-91f1-640a-e927-963f7f82de44
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: 00000000:00:0c.0" query_field_name=pci.bus_id raw_value=00000000:00:0C.0
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: [n/a]" query_field_name=vgpu_device_capability.fractional_multiVgpu raw_value=[N/A]
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: [n/a]" query_field_name=vgpu_device_capability.heterogeneous_timeSlice_profile raw_value=[N/A]
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: [n/a]" query_field_name=vgpu_device_capability.heterogeneous_timeSlice_sizes raw_value=[N/A]
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: [n/a]" query_field_name=pcie.link.gen.hostmax raw_value=[N/A]
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: none" query_field_name=addressing_mode raw_value=None
ts=2023-07-18T07:23:20.116Z caller=exporter.go:209 level=debug error="could not parse number from value: [n/a]" query_field_name=driver_model.current raw_value=[N/A]
ts=202

Model and Version

  • GPU Model: NVIDIA A10
  • Operating System: Ubuntu Server 20.04
  • Nvidia GPU driver version: 535.54.03

Additional context
$ dpkg -l | grep nvidia
ii libnvidia-cfg1-525:amd64 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-525 525.125.06-0ubuntu0.20.04.3 all Shared files used by the NVIDIA libraries
ii libnvidia-compute-525:amd64 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA libcompute package
rc libnvidia-compute-535:amd64 535.54.03-0ubuntu0.20.04.4 amd64 NVIDIA libcompute package
ii libnvidia-decode-525:amd64 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA Video Decoding runtime libraries
ii libnvidia-encode-525:amd64 525.125.06-0ubuntu0.20.04.3 amd64 NVENC Video Encoding runtime library
ii libnvidia-extra-525:amd64 525.125.06-0ubuntu0.20.04.3 amd64 Extra libraries for the NVIDIA driver
ii libnvidia-fbc1-525:amd64 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-525:amd64 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii nvidia-compute-utils-525 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA compute utilities
ii nvidia-dkms-525 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA DKMS package
ii nvidia-driver-525 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA driver metapackage
ii nvidia-driver-local-repo-ubuntu2004-515.105.01 1.0-1 amd64 nvidia-driver-local repository configuration files
ii nvidia-kernel-common-525 525.125.06-0ubuntu0.20.04.3 amd64 Shared files used with the kernel module
ii nvidia-kernel-source-525 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA kernel source package
ii nvidia-prime 0.8.16~0.20.04.2 all Tools to enable NVIDIA's Prime
ii nvidia-settings 470.57.01-0ubuntu0.20.04.3 amd64 Tool for configuring the NVIDIA graphics driver
ii nvidia-utils-525 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA driver support binaries
ii screen-resolution-extra 0.18build1 all Extension for the nvidia-settings control panel
ii xserver-xorg-video-nvidia-525 525.125.06-0ubuntu0.20.04.3 amd64 NVIDIA binary Xorg driver

$ nvidia-smi
Tue Jul 18 08:37:46 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10 Off | 00000000:00:0C.0 Off | 0 |
| 0% 54C P0 63W / 150W | 8594MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 28269 C python 8582MiB |
+---------------------------------------------------------------------------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant