Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI in-line volume setup intermittently fails with config error in azstorage [account name not provided] #1340

Open
technicianted opened this issue Apr 8, 2024 · 13 comments

Comments

@technicianted
Copy link

What happened:
When starting a pod with in-line CSI volume, it intermittently fails to mount with the error:

Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    59s                default-scheduler  Successfully assigned default/jabba-image-vision-example-train-1710958708-tzr76 to aks-cpu-34807744-vmss000001
  Warning  FailedMount  24s (x7 over 59s)  kubelet            MountVolume.SetUp failed for volume "build-data" : rpc error: code = Internal desc = Mount failed with error: rpc error: code = Unknown desc = exit status 1 *** blobfuse2: A new version [2.2.1] is available. Consider upgrading to latest version for bug-fixes & new features. ***
Error: failed to initialize new pipeline [config error in azstorage [account name not provided]]
, output: 
Please refer to http://aka.ms/blobmounterror for possible causes and solutions for mount errors.

After a few kubelet backoffs (no spec changes), it succeeds.

What you expected to happen:
Volume setup should succeed the first time.

How to reproduce it:
Create a deployment with large number of pods to increase the change of happening - perhaps 20+. Use something similar to this:

  csi:
    driver: blob.csi.azure.com
    volumeAttributes:
      azureStorageAuthType: MSI
      azureStorageIdentityClientID: <clientID_here>
      storageAccountName: mystorageaccount
      containerName: mycontainer
      protocol: fuse
      mountOptions: -o allow_other --file-cache-timeout-in-seconds=120 --log-level=LOG_DEBUG --virtual-directory=false --streaming=true

Anything else we need to know?:

This is highly intermittent. For a large number of pods in a deployment, most of them succeed the first time. Others can take a few retries.

Environment:

  • CSI Driver version: mcr.microsoft.com/oss/kubernetes-csi/blob-csi:v1.21.7
  • Kubernetes version (use kubectl version): v1.26.12
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.4 LTS
  • Kernel (e.g. uname -a): 6.2.0-1019-azure
  • Install tools:
  • Others:
@andyzhangx
Copy link
Member

can you use storageAccount instead of storageAccountName in volumeAttributes? @technicianted

@technicianted
Copy link
Author

@andyzhangx with storageAccount instead of storageAccountName, all pods fail to setup the volume. According to source code, it should be storageAccountName:

  Warning  FailedMount  19s (x8 over 86s)  kubelet  MountVolume.SetUp failed for volume "output" : rpc error: code = Internal desc = Mount failed with error: rpc error: code = Unknown desc = exit status 1 Error: failed to initialize new pipeline [config error in azstorage [account name not provided]]
, output: 
Please refer to http://aka.ms/blobmounterror for possible causes and solutions for mount errors.

@andyzhangx
Copy link
Member

that's actually the same, you need to specify account name in secret:

kubectl create secret generic azure-secret --from-literal=azurestorageaccountname="xxx" -n pod-namespace

and then specify secretName: azure-secret, that's a tricky part in pod inline volume:

volumeAttributes:
      azureStorageAuthType: MSI
      azureStorageIdentityClientID: <clientID_here>
      storageAccountName: mystorageaccount
      containerName: mycontainer
      secretName: azure-secret

@technicianted
Copy link
Author

MSI does not need a secret. Note that it is intermittent, not consistently failing. Out of 200 pods, about 10 suffer from this problem. After a few backoffs they mostly succeed.

If secret is required it would have failed consistently.

@andyzhangx
Copy link
Member

no, you only need to specify azurestorageaccountname in the secret, that's the way pod inline volume to get the account name, that's for the sake of security.

@technicianted
Copy link
Author

That seems to have fixed it.

Few clarifying questions:

  1. Documentation clearly says that secret is not mandatory, and is only used to store the account secret. Please update documentation accordingly.
  2. The fact that when no secretName is specified it still works 90% of the time probably indicates a race condition bug in the volume setup. Adding secretName seems to just work around this potential bug.

Thanks for your help.

@technicianted
Copy link
Author

This problem is still happening with Kubernetes secrets but at a much lower rate. About once every 200 times. Code is still racy.

@andyzhangx
Copy link
Member

what's current error msg?

@technicianted
Copy link
Author

technicianted commented May 1, 2024

Same: [account name not provided]

@andyzhangx
Copy link
Member

@technicianted pls follow this guide to provide csi driver logs on the node: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md#case2-volume-mountunmount-failed, and what's current pod config?

@technicianted
Copy link
Author

technicianted commented May 17, 2024

Pod volume config in original post.

May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_TRACE [file_cache.go (219)]: FileCache::Configure : file_cache
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_ERR [file_cache.go (271)]: FileCache: config error [tmp-path does not exist. attempting to create t
mp-path.]
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_INFO [file_cache.go (304)]: FileCache::Configure : Using default eviction policy
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_INFO [file_cache.go (331)]: FileCache::Configure : create-empty false, cache-timeout 120, tmp-path
/mnt/csi-1abdf498f4f317f096db8ac6936f05190662d16fe3f05fba30318c703c009cc2, max-size-mb 0, high-mark 80, low-mark 60, refresh-sec 0, max-eviction 5000, hard-limi
t false, policy , allow-non-empty-temp true, cleanup-on-start false, policy-trace false, offload-io false, sync-to-flush false, ignore-sync true, defaultPermiss
ion -rwxrwxrwx, diskHighWaterMark 0, maxCacheSize 0, mountPath /var/lib/kubelet/pods/f725b066-8c6f-42d6-a778-e209bf59521d/volumes/kubernetes.io~csi/output/mount
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_TRACE [attr_cache.go (126)]: AttrCache::Configure : attr_cache
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_INFO [attr_cache.go (156)]: AttrCache::Configure : cache-timeout 120, symlink false, cache-on-list
true, max-files 5000000
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_TRACE [azstorage.go (84)]: AzStorage::Configure : azstorage
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_TRACE [config.go (292)]: ParseAndValidateConfig : Parsing config
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_ERR [azstorage.go (95)]: AzStorage::Configure : Config validation failed [account name not provided]
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_ERR [pipeline.go (69)]: Pipeline: error creating pipeline component azstorage [config error in azstorage [account name not provided]]
May 04 02:33:23 managed-a100-a00005M blobfuse2[1665499]: LOG_ERR [mount.go (410)]: mount : failed to initialize new pipeline [config error in azstorage [account name not provided]]
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_WARNING [mount.go (385)]: mount: unsupported v1 CLI parameter: pre-mount-validate is always true in blobfuse2.
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_CRIT [mount.go (405)]: Starting Blobfuse2 Mount : 2.3.0~preview.1 on [Ubuntu 22.04.4 LTS]
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_CRIT [mount.go (406)]: Logging level set to : LOG_DEBUG
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_DEBUG [mount.go (407)]: Mount allowed on nonempty path : false
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_TRACE [libfuse.go (253)]: Libfuse::Configure : libfuse
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_INFO [libfuse.go (244)]: Libfuse::Validate : UID 0, GID 0
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_INFO [libfuse.go (305)]: Libfuse::Configure : read-only false, allow-other true, allow-root false, default-perm 511, entry-timeout 120, attr-time 120, negative-timeout 120, ignore-open-flags true, nonempty false, direct_io false, max-fuse-threads 128, fuse-trace false, extension , disable-writeback-cache false, dirPermission 511, mountPath /var/lib/kubelet/pods/f725b066-8c6f-42d6-a778-e209bf59521d/volumes/kubernetes.io~csi/output/mount, umask 0
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_TRACE [file_cache.go (219)]: FileCache::Configure : file_cache
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_INFO [file_cache.go (304)]: FileCache::Configure : Using default eviction policy
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_INFO [file_cache.go (331)]: FileCache::Configure : create-empty false, cache-timeout 120, tmp-path /mnt/csi-1abdf498f4f317f096db8ac6936f05190662d16fe3f05fba30318c703c009cc2, max-size-mb 0, high-mark 80, low-mark 60, refresh-sec 0, max-eviction 5000, hard-limit false, policy , allow-non-empty-temp true, cleanup-on-start false, policy-trace false, offload-io false, sync-to-flush false, ignore-sync true, defaultPermission -rwxrwxrwx, diskHighWaterMark 0, maxCacheSize 0, mountPath /var/lib/kubelet/pods/f725b066-8c6f-42d6-a778-e209bf59521d/volumes/kubernetes.io~csi/output/mount
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_TRACE [attr_cache.go (126)]: AttrCache::Configure : attr_cache
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_INFO [attr_cache.go (156)]: AttrCache::Configure : cache-timeout 120, symlink false, cache-on-list true, max-files 5000000
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_TRACE [azstorage.go (84)]: AzStorage::Configure : azstorage
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_TRACE [config.go (292)]: ParseAndValidateConfig : Parsing config
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_ERR [azstorage.go (95)]: AzStorage::Configure : Config validation failed [account name not provided]
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_ERR [pipeline.go (69)]: Pipeline: error creating pipeline component azstorage [config error in azstorage [account name not provided]]
May 04 02:33:24 managed-a100-a00005M blobfuse2[1665512]: LOG_ERR [mount.go (410)]: mount : failed to initialize new pipeline [config error in azstorage [account name not provided]]

@andyzhangx
Copy link
Member

@technicianted could you provide the csi driver logs on the node:

kubectl logs csi-blob-node-cvgbs -c blob -n kube-system > csi-blob-node.log

@technicianted
Copy link
Author

Sorry we moved away from this driver due to instability. Can't provide any more logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants