Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs for blobfuse2 diverge from its source documentation #1333

Open
fabio-s-franco opened this issue Apr 8, 2024 · 11 comments
Open

Logs for blobfuse2 diverge from its source documentation #1333

fabio-s-franco opened this issue Apr 8, 2024 · 11 comments

Comments

@fabio-s-franco
Copy link

fabio-s-franco commented Apr 8, 2024

What happened:
Had an issue (Azure/azure-storage-fuse#1376) with HNS enabled blobfuse2 based mount when it is only accessible via private endpoint (public network access disabled). It made it quite difficult to workout the issue because there are no log files on the location documented by blobfuse2 project /var/log/blobfuse2.log.

In fact, the only way I could find any logs at all was to shell into the relevant node agent (csi-blob-node daemonset) and execute a mount command by hand. Other than that, there are no blobfuse2.log files available from within the blob container within the pod.

What you expected to happen:
Find log files to debug issues on the documented location of /var/log/blobfuse2.log

How to reproduce it:
Enable the managed driver via azure-cli or any other means in an AKS cluster. Version installed on my cluster has blobfuse2 2.1.2

  • AKS cluster must be private
  • Storage Account needs to have a private endpoint setup with subresource set to blob (Premium_ZRS sku)
  • Setup a storage class so it has the following mount option:
    "--use-adls=true",
    "-o allow_other",
    "-o attr_timeout=120",
    "--file-cache-timeout-in-seconds=120",
    "--use-attr-cache=true",
    "--cancel-list-on-mount-seconds=10",
    "-o entry_timeout=120",
    "-o negative_timeout=120",
    "--virtual-directory=true"

Anything else we need to know?:

The original issue was caused by the fact that blobfuse2 does not try to connect with standard (non HNS) endpoint fdqn, therefore the private endpoint naively setup using the blob as target subresource instead of dfs, will not be reachable as it will attempt to connect to *.dfs.core.windows.net instead of *.blob.core.windows.net, whose private DNS zone will not exist when the private endpoint was initially set to accommodate standard non HNS mounts. It is not clear and it is very difficult to debug that for the same storage account, if you can mount either HNS enabled or disabled containers, the former requires a private endpoint setup specifically for that purpose which resolves to a different fdqn than the latter.
On top of that it does not seem to be possible to configure the endpoint to point to another fdqn for that case. Even setting AZURE_STORAGE_BLOB_ENDPOINT environment variable directly, does not influence the fdqn used by blobfuse2 to connect.

So, without any logs, it becomes nearly impossible to determine the root cause.
What is documented on https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/csi-debug.md will only output what is directly sent to stdout. For example:

Error: failed to initialize new pipeline [failed to authenticate credentials for azstorage]

This error does not help determine the problem, which can have several different underlying root causes (I alone, have experienced three different ones, for the same error message). Which I believe is an issue for the blobfuse2 to handle (better error output), but nonetheless it doesn't help that no log is persisted within the node agent. Moreover systemctl (or journalctl) command does not work on node agent pod as blobfuse2 is not managed by systemd. So the documentation is either inaccurate or not applicable to managed installation of CSI Blob driver / AKS setup I have.

Environment:

  • CSI Driver version: 1.22.5
  • Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.9", GitCommit:"d33c44091f0e760b0053f06023e87a1c99dfd302", GitTreeState:"clean", BuildDate:"2024-01-31T01:58:06Z", GoVersion:"go1.20.12", Compiler:"gc", Platform:"linux/amd64"
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 12 (bookworm)
  • Kernel (e.g. uname -a): 5.15.0-1058-azure chore: update helm charts #66-Ubuntu SMP Fri Feb 16 00:40:24 UTC 2024 x86_64 GNU/Linux
  • Install tools: azure-cli
  • Others: blobfuse2 2.1.2
@andyzhangx
Copy link
Member

@fabio-s-franco if you have installed aks managed blob csi driver (https://learn.microsoft.com/en-us/azure/aks/azure-blob-csi?tabs=NFS#enable-csi-driver-on-a-new-or-existing-aks-cluster), the log file is /var/log/blobfuse2.log on the VM

1 similar comment
@andyzhangx
Copy link
Member

@fabio-s-franco if you have installed aks managed blob csi driver (https://learn.microsoft.com/en-us/azure/aks/azure-blob-csi?tabs=NFS#enable-csi-driver-on-a-new-or-existing-aks-cluster), the log file is /var/log/blobfuse2.log on the VM

@fabio-s-franco
Copy link
Author

@andyzhangx That's precisely what I am saying is not the case:

kubectl exec -it -n kube-system csi-blob-node-xxxxx -c blob -- bash
root@aks-syspool-xxxxxxxx-vmss000000:/# ls -la /var/log 
total 20
drwxr-xr-x 1 root root 4096 Jan 19 02:42 .
drwxr-xr-x 1 root root 4096 Jul  3  2023 ..
drwxr-xr-x 2 root root 4096 Jan 19 02:42 apt
-rw-r--r-- 1 root root 3036 Jan 19 02:42 dpkg.log

That's all there is

@andyzhangx
Copy link
Member

@fabio-s-franco the logs are on the node, run following command

sudo wget https://raw.githubusercontent.com/andyzhangx/demo/master/dev/kubectl-enter
sudo chmod a+x ./kubectl-enter
./kubectl-enter <node-name>

@fabio-s-franco
Copy link
Author

fabio-s-franco commented Apr 9, 2024

@andyzhangx Thank you for looking into it. I have 3 points I observe after following your suggestion:

1 - Using the helper container to shell into the k8s node still does not find the file at the specified location:

./kubectl-enter aks-syspool-xxxxxxxx-vmss000000
spawning "nsenter-46zser" on "aks-syspool-xxxxxxxx-vmss000000"
If you don't see a command prompt, try pressing enter.
root@aks-syspool-xxxxxxxx-vmss000000:/# cat /var/log/blobfuse2.log
cat: /var/log/blobfuse2.log: No such file or directory

2 - The documentation is ambiguous on the meaning of node agent. Every command on the example talks about entering the shell of a container within "csi-node". At no moment it indicates otherwise, other then mentioning agent node, which can be interpreted as the csi-node daemon pod related to the node where the container is being mounted into.
Also, as you can see from my output of entering the shell of the driver, it is the same user (root) in the same VM and blobfuse2 command is executable from within it.
The context may be different, but one can imagine it can get confusing.

3 - Setting aside the confusion, there is very little detail around using the CSI driver with a private endpoint that uses HNS or that it needs to connect via ADLS API, which requires a separate private endpoint (dfs, not blob one).

@andyzhangx
Copy link
Member

can you provide the csi driver daemonset pod logs on the node? and what's the ls -lt /var/log/ output? @fabio-s-franco

@fabio-s-franco
Copy link
Author

fabio-s-franco commented Apr 9, 2024

@andyzhangx This is a sample of the logs now from a pod on a node that provisions the blob mount:

DRIVER INFORMATION:
-------------------
Build Date: "2024-01-19T02:40:14Z"
Compiler: gc
Driver Name: blob.csi.azure.com
Driver Version: v1.22.5
Git Commit: c62217028e177868059beeae7fb3bbd657060511
Go Version: go1.21.3
Platform: linux/amd64

Streaming logs below:
I0409 14:54:57.501690  155991 blob.go:271] driver userAgent: blob.csi.azure.com/v1.22.5 AKS
I0409 14:54:57.501828  155991 azure.go:74] set QPS(25.000000) and QPS Burst(50) for driver kubeClient
I0409 14:54:57.509908  155991 azure.go:88] reading cloud config from secret kube-system/azure-cloud-provider
I0409 14:54:57.571723  155991 azure.go:95] InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" not found
I0409 14:54:57.571746  155991 azure.go:100] could not read cloud config from secret kube-system/azure-cloud-provider
I0409 14:54:57.571756  155991 azure.go:103] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/azure.json
I0409 14:54:57.571790  155991 azure.go:114] read cloud config from file: /etc/kubernetes/azure.json successfully
I0409 14:54:57.572515  155991 azure_auth.go:279] Using AzurePublicCloud environment
I0409 14:54:57.572535  155991 azure_auth.go:130] azure: using managed identity extension to retrieve access token
I0409 14:54:57.572542  155991 azure_auth.go:136] azure: using User Assigned MSI ID to retrieve access token
I0409 14:54:57.572595  155991 azure_auth.go:147] azure: User Assigned MSI ID is client ID
I0409 14:54:57.572623  155991 azure.go:854] Azure cloudprovider using try backoff: retries=6, exponent=1.500000, duration=5, jitter=1.000000
I0409 14:54:57.572831  155991 azure_interfaceclient.go:74] Azure InterfacesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572843  155991 azure_interfaceclient.go:77] Azure InterfacesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572857  155991 azure_vmsizeclient.go:68] Azure VirtualMachineSizesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572868  155991 azure_vmsizeclient.go:71] Azure VirtualMachineSizesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572895  155991 azure_snapshotclient.go:70] Azure SnapshotClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572904  155991 azure_snapshotclient.go:73] Azure SnapshotClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572914  155991 azure_storageaccountclient.go:70] Azure StorageAccountClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572920  155991 azure_storageaccountclient.go:73] Azure StorageAccountClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572927  155991 azure_diskclient.go:68] Azure DisksClient using API version: 2022-07-02
I0409 14:54:57.572936  155991 azure_diskclient.go:73] Azure DisksClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572943  155991 azure_diskclient.go:76] Azure DisksClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572956  155991 azure_vmclient.go:70] Azure VirtualMachine client (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572976  155991 azure_vmclient.go:73] Azure VirtualMachine client (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572987  155991 azure_vmssclient.go:70] Azure VirtualMachineScaleSetClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572993  155991 azure_vmssclient.go:73] Azure VirtualMachineScaleSetClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573002  155991 azure_vmssvmclient.go:75] Azure vmssVM client (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573008  155991 azure_vmssvmclient.go:78] Azure vmssVM client (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573021  155991 azure_routeclient.go:69] Azure RoutesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573031  155991 azure_routeclient.go:72] Azure RoutesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573055  155991 azure_subnetclient.go:70] Azure SubnetsClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573063  155991 azure_subnetclient.go:73] Azure SubnetsClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573080  155991 azure_routetableclient.go:69] Azure RouteTablesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573091  155991 azure_routetableclient.go:72] Azure RouteTablesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573105  155991 azure_loadbalancerclient.go:74] Azure LoadBalancersClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573113  155991 azure_loadbalancerclient.go:77] Azure LoadBalancersClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573136  155991 azure_securitygroupclient.go:70] Azure SecurityGroupsClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573145  155991 azure_securitygroupclient.go:73] Azure SecurityGroupsClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573154  155991 azure_publicipclient.go:74] Azure PublicIPAddressesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573160  155991 azure_publicipclient.go:77] Azure PublicIPAddressesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573180  155991 azure_blobclient.go:73] Azure BlobClient using API version: 2021-09-01
I0409 14:54:57.573195  155991 azure_vmasclient.go:70] Azure AvailabilitySetsClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573215  155991 azure_vmasclient.go:73] Azure AvailabilitySetsClient  (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573273  155991 azure.go:1097] attach/detach disk operation rate limit QPS: 6.000000, Bucket: 10
I0409 14:54:57.573304  155991 azure.go:150] starting node server on node(aks-xxxxxxxxxx-xxxxxxxx-vmss000000)
I0409 14:54:57.573322  155991 blob.go:276] cloud: AzurePublicCloud, location: westeurope, rg: mc-xxxx-euw, VnetName: vnet-xxxxxx-xxxx, VnetResourceGroup: rg-xxxxxx-xxxx-xxxx, SubnetName: snet-xxx-aks-xxxxxxxx
I0409 14:54:57.576002  155991 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
I0409 14:54:57.576057  155991 driver.go:80] Enabling controller service capability: CREATE_DELETE_VOLUME
I0409 14:54:57.576065  155991 driver.go:80] Enabling controller service capability: EXPAND_VOLUME
I0409 14:54:57.576071  155991 driver.go:80] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0409 14:54:57.576077  155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0409 14:54:57.576083  155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_READER_ONLY
I0409 14:54:57.576088  155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0409 14:54:57.576093  155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0409 14:54:57.576098  155991 driver.go:99] Enabling volume access mode: MULTI_NODE_READER_ONLY
I0409 14:54:57.576103  155991 driver.go:99] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER
I0409 14:54:57.576108  155991 driver.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0409 14:54:57.576114  155991 driver.go:90] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0409 14:54:57.576120  155991 driver.go:90] Enabling node service capability: SINGLE_NODE_MULTI_WRITER
I0409 14:54:57.576295  155991 server.go:114] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0409 14:54:58.197687  155991 utils.go:75] GRPC call: /csi.v1.Identity/GetPluginInfo
I0409 14:54:58.197709  155991 utils.go:76] GRPC request: {}
I0409 14:54:58.199663  155991 utils.go:82] GRPC response: {"name":"blob.csi.azure.com","vendor_version":"v1.22.5"}
I0409 14:54:58.351239  155991 utils.go:75] GRPC call: /csi.v1.Identity/GetPluginInfo
I0409 14:54:58.351391  155991 utils.go:76] GRPC request: {}
I0409 14:54:58.351430  155991 utils.go:82] GRPC response: {"name":"blob.csi.azure.com","vendor_version":"v1.22.5"}
I0409 14:54:58.490367  155991 utils.go:75] GRPC call: /csi.v1.Node/NodeGetInfo
I0409 14:54:58.490382  155991 utils.go:76] GRPC request: {}
I0409 14:54:58.490551  155991 utils.go:82] GRPC response: {"node_id":"aks-xxxxxx-xxxxxxxxxxx-vmss000000"}

Though I have already overcome the issue with DFS connectivity and the problem is more around the ability to properly investigate the cause. which I have more detailed logs on the issue from blobfuse2 repo.

The contents of /var/log are:

total 10424
-rw-r-----   1 syslog  adm              1999761 Apr  9 15:33 messages
-rw-r-----   1 syslog  adm              1994539 Apr  9 15:33 syslog
drwxr-xr-x   2 root    root               12288 Apr  9 15:33 containers
-rw-r-----   1 root    root               42889 Apr  9 15:33 azure-vnet.log
-rw-r-----   1 syslog  adm                68505 Apr  9 15:33 kern.log
-rw-r-----   1 root    root                1935 Apr  9 15:33 azure-vnet-telemetry.log
-rw-r-----   1 syslog  adm                62916 Apr  9 15:33 warn
-rw-r-----   1 root    root               23032 Apr  9 15:33 azure-vnet-ipam.log
drwxr-xr-x  16 root    root                4096 Apr  9 15:33 pods
-rw-r--r--   1 root    root               77594 Apr  9 15:32 waagent.log
-rw-r-----   1 syslog  adm                 5222 Apr  9 15:25 auth.log
drwx------   7 root    root                4096 Apr  9 13:37 azure
drwxr-xr-x   5 root    root                4096 Apr  9 13:03 microsoft-defender-for-cloud
-rw-r-----   1 root    adm                 4070 Apr  9 12:57 cloud-init-output.log
-rw-r-----   1 syslog  adm               164381 Apr  9 12:57 cloud-init.log
-rw-rw-r--   1 root    utmp               17280 Apr  9 12:57 wtmp
-rw-r-----   1 root    adm                58706 Apr  9 12:57 dmesg
drwxr-x---   2 root    adm                 4096 Apr  9 12:57 unattended-upgrades
-rw-r--r--   1 root    root                   0 Apr  9 12:57 ubuntu-advantage.log
-rw-r-----   1 syslog  adm              2029328 Apr  9 12:57 messages.1
-rw-r-----   1 syslog  adm              2391470 Apr  9 12:57 syslog.1
-rw-r-----   1 syslog  adm                32106 Apr  9 12:57 auth.log.1
-rw-r-----   1 syslog  adm               826533 Apr  9 12:57 kern.log.1
-rw-rw----   1 root    utmp                   0 Apr  9 12:57 btmp
-rw-r--r--   1 root    root                   0 Apr  9 12:57 dpkg.log
drwxr-xr-x   2 root    root                4096 Apr  9 12:57 apt
-rw-r--r--   1 root    root                   0 Apr  9 12:57 alternatives.log
-rw-r-----   1 syslog  adm               240139 Apr  9 12:57 warn.1
-rw-rw-r--   1 root    utmp            18701432 Apr  9 12:57 lastlog
drwxr-sr-x+  5 root    systemd-journal     4096 Apr  9 12:56 journal
-rw-r-----   1 root    adm                54699 Mar 19 18:48 dmesg.0
drwx------   2 root    root                4096 Mar 19 16:37 landscape
-rw-r-----   1 root    root                   0 Mar 19 16:37 blobfuse-flexvol-installer.log
-rw-r-----   1 root    root                   0 Mar 19 16:37 blobfuse-driver.log
-rw-r-----   1 root    root                   0 Mar 19 16:37 kv-driver.log
-rw-r-----   1 root    root                   0 Mar 19 16:37 azure-cnimonitor.log
-rw-r-----   1 root    root              173758 Mar 19 16:37 dpkg.log.1
-rw-r-----   1 root    root                   0 Mar 19 16:37 daemon.log
-rw-r-----   1 root    root               10655 Mar 19 16:37 alternatives.log.1
-rw-r-----   1 syslog  adm                28353 Mar 19 16:37 localmessages
-rw-r-----   1 root    root              166508 Mar 19 16:37 ubuntu-advantage.log.1
-rw-r-----   1 root    adm                15934 Mar 19 16:34 dmesg.1.gz
-rw-r-----   1 root    adm                14393 Mar 19 15:52 dmesg.2.gz
-rw-r-----   1 root    adm                12796 Mar 19 15:45 dmesg.3.gz
drwx------   2 root    root                4096 Mar 19 15:44 private
-rw-rw----   1 root    utmp                   0 Feb 29 21:56 btmp.1
drwxrws--T   2 ceph    ceph                4096 Jan 11 12:26 ceph
drwxr-xr-x   2 root    root                4096 Jun  5  2023 sysstat
drwxr-xr-x   2 root    root                4096 Mar 14  2023 dist-upgrade
drwxr-xr-x   2 _chrony _chrony             4096 Aug 25  2020 chrony
drwxr-xr-x   2 root    root                4096 Jan 25  2020 glusterfs

The closest thing is /var/log/blobfuse-driver.log, which as you can see, is empty.
The only way I can get logs from blobfuse2 is by manually executing, blobfuse2 mount command on the csi-node pod. Which then generated the blobfuse file, which helped me identify the issue, which there is more info about, on the blobfuse repo.

edit: find / -name "blobfuse2.log" does not return any result.

@andyzhangx
Copy link
Member

@andyzhangx This is a sample of the logs now from a pod on a node that provisions the blob mount:

DRIVER INFORMATION:
-------------------
Build Date: "2024-01-19T02:40:14Z"
Compiler: gc
Driver Name: blob.csi.azure.com
Driver Version: v1.22.5
Git Commit: c62217028e177868059beeae7fb3bbd657060511
Go Version: go1.21.3
Platform: linux/amd64

Streaming logs below:
I0409 14:54:57.501690  155991 blob.go:271] driver userAgent: blob.csi.azure.com/v1.22.5 AKS
I0409 14:54:57.501828  155991 azure.go:74] set QPS(25.000000) and QPS Burst(50) for driver kubeClient
I0409 14:54:57.509908  155991 azure.go:88] reading cloud config from secret kube-system/azure-cloud-provider
I0409 14:54:57.571723  155991 azure.go:95] InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" not found
I0409 14:54:57.571746  155991 azure.go:100] could not read cloud config from secret kube-system/azure-cloud-provider
I0409 14:54:57.571756  155991 azure.go:103] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/azure.json
I0409 14:54:57.571790  155991 azure.go:114] read cloud config from file: /etc/kubernetes/azure.json successfully
I0409 14:54:57.572515  155991 azure_auth.go:279] Using AzurePublicCloud environment
I0409 14:54:57.572535  155991 azure_auth.go:130] azure: using managed identity extension to retrieve access token
I0409 14:54:57.572542  155991 azure_auth.go:136] azure: using User Assigned MSI ID to retrieve access token
I0409 14:54:57.572595  155991 azure_auth.go:147] azure: User Assigned MSI ID is client ID
I0409 14:54:57.572623  155991 azure.go:854] Azure cloudprovider using try backoff: retries=6, exponent=1.500000, duration=5, jitter=1.000000
I0409 14:54:57.572831  155991 azure_interfaceclient.go:74] Azure InterfacesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572843  155991 azure_interfaceclient.go:77] Azure InterfacesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572857  155991 azure_vmsizeclient.go:68] Azure VirtualMachineSizesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572868  155991 azure_vmsizeclient.go:71] Azure VirtualMachineSizesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572895  155991 azure_snapshotclient.go:70] Azure SnapshotClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572904  155991 azure_snapshotclient.go:73] Azure SnapshotClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572914  155991 azure_storageaccountclient.go:70] Azure StorageAccountClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572920  155991 azure_storageaccountclient.go:73] Azure StorageAccountClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572927  155991 azure_diskclient.go:68] Azure DisksClient using API version: 2022-07-02
I0409 14:54:57.572936  155991 azure_diskclient.go:73] Azure DisksClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572943  155991 azure_diskclient.go:76] Azure DisksClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572956  155991 azure_vmclient.go:70] Azure VirtualMachine client (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572976  155991 azure_vmclient.go:73] Azure VirtualMachine client (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572987  155991 azure_vmssclient.go:70] Azure VirtualMachineScaleSetClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.572993  155991 azure_vmssclient.go:73] Azure VirtualMachineScaleSetClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573002  155991 azure_vmssvmclient.go:75] Azure vmssVM client (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573008  155991 azure_vmssvmclient.go:78] Azure vmssVM client (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573021  155991 azure_routeclient.go:69] Azure RoutesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573031  155991 azure_routeclient.go:72] Azure RoutesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573055  155991 azure_subnetclient.go:70] Azure SubnetsClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573063  155991 azure_subnetclient.go:73] Azure SubnetsClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573080  155991 azure_routetableclient.go:69] Azure RouteTablesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573091  155991 azure_routetableclient.go:72] Azure RouteTablesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573105  155991 azure_loadbalancerclient.go:74] Azure LoadBalancersClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573113  155991 azure_loadbalancerclient.go:77] Azure LoadBalancersClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573136  155991 azure_securitygroupclient.go:70] Azure SecurityGroupsClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573145  155991 azure_securitygroupclient.go:73] Azure SecurityGroupsClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573154  155991 azure_publicipclient.go:74] Azure PublicIPAddressesClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573160  155991 azure_publicipclient.go:77] Azure PublicIPAddressesClient (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573180  155991 azure_blobclient.go:73] Azure BlobClient using API version: 2021-09-01
I0409 14:54:57.573195  155991 azure_vmasclient.go:70] Azure AvailabilitySetsClient (read ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573215  155991 azure_vmasclient.go:73] Azure AvailabilitySetsClient  (write ops) using rate limit config: QPS=10, bucket=100
I0409 14:54:57.573273  155991 azure.go:1097] attach/detach disk operation rate limit QPS: 6.000000, Bucket: 10
I0409 14:54:57.573304  155991 azure.go:150] starting node server on node(aks-xxxxxxxxxx-xxxxxxxx-vmss000000)
I0409 14:54:57.573322  155991 blob.go:276] cloud: AzurePublicCloud, location: westeurope, rg: mc-xxxx-euw, VnetName: vnet-xxxxxx-xxxx, VnetResourceGroup: rg-xxxxxx-xxxx-xxxx, SubnetName: snet-xxx-aks-xxxxxxxx
I0409 14:54:57.576002  155991 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
I0409 14:54:57.576057  155991 driver.go:80] Enabling controller service capability: CREATE_DELETE_VOLUME
I0409 14:54:57.576065  155991 driver.go:80] Enabling controller service capability: EXPAND_VOLUME
I0409 14:54:57.576071  155991 driver.go:80] Enabling controller service capability: SINGLE_NODE_MULTI_WRITER
I0409 14:54:57.576077  155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0409 14:54:57.576083  155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_READER_ONLY
I0409 14:54:57.576088  155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_SINGLE_WRITER
I0409 14:54:57.576093  155991 driver.go:99] Enabling volume access mode: SINGLE_NODE_MULTI_WRITER
I0409 14:54:57.576098  155991 driver.go:99] Enabling volume access mode: MULTI_NODE_READER_ONLY
I0409 14:54:57.576103  155991 driver.go:99] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER
I0409 14:54:57.576108  155991 driver.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0409 14:54:57.576114  155991 driver.go:90] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0409 14:54:57.576120  155991 driver.go:90] Enabling node service capability: SINGLE_NODE_MULTI_WRITER
I0409 14:54:57.576295  155991 server.go:114] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0409 14:54:58.197687  155991 utils.go:75] GRPC call: /csi.v1.Identity/GetPluginInfo
I0409 14:54:58.197709  155991 utils.go:76] GRPC request: {}
I0409 14:54:58.199663  155991 utils.go:82] GRPC response: {"name":"blob.csi.azure.com","vendor_version":"v1.22.5"}
I0409 14:54:58.351239  155991 utils.go:75] GRPC call: /csi.v1.Identity/GetPluginInfo
I0409 14:54:58.351391  155991 utils.go:76] GRPC request: {}
I0409 14:54:58.351430  155991 utils.go:82] GRPC response: {"name":"blob.csi.azure.com","vendor_version":"v1.22.5"}
I0409 14:54:58.490367  155991 utils.go:75] GRPC call: /csi.v1.Node/NodeGetInfo
I0409 14:54:58.490382  155991 utils.go:76] GRPC request: {}
I0409 14:54:58.490551  155991 utils.go:82] GRPC response: {"node_id":"aks-xxxxxx-xxxxxxxxxxx-vmss000000"}

Though I have already overcome the issue with DFS connectivity and the problem is more around the ability to properly investigate the cause. which I have more detailed logs on the issue from blobfuse2 repo.

The contents of /var/log are:

total 10424
-rw-r-----   1 syslog  adm              1999761 Apr  9 15:33 messages
-rw-r-----   1 syslog  adm              1994539 Apr  9 15:33 syslog
drwxr-xr-x   2 root    root               12288 Apr  9 15:33 containers
-rw-r-----   1 root    root               42889 Apr  9 15:33 azure-vnet.log
-rw-r-----   1 syslog  adm                68505 Apr  9 15:33 kern.log
-rw-r-----   1 root    root                1935 Apr  9 15:33 azure-vnet-telemetry.log
-rw-r-----   1 syslog  adm                62916 Apr  9 15:33 warn
-rw-r-----   1 root    root               23032 Apr  9 15:33 azure-vnet-ipam.log
drwxr-xr-x  16 root    root                4096 Apr  9 15:33 pods
-rw-r--r--   1 root    root               77594 Apr  9 15:32 waagent.log
-rw-r-----   1 syslog  adm                 5222 Apr  9 15:25 auth.log
drwx------   7 root    root                4096 Apr  9 13:37 azure
drwxr-xr-x   5 root    root                4096 Apr  9 13:03 microsoft-defender-for-cloud
-rw-r-----   1 root    adm                 4070 Apr  9 12:57 cloud-init-output.log
-rw-r-----   1 syslog  adm               164381 Apr  9 12:57 cloud-init.log
-rw-rw-r--   1 root    utmp               17280 Apr  9 12:57 wtmp
-rw-r-----   1 root    adm                58706 Apr  9 12:57 dmesg
drwxr-x---   2 root    adm                 4096 Apr  9 12:57 unattended-upgrades
-rw-r--r--   1 root    root                   0 Apr  9 12:57 ubuntu-advantage.log
-rw-r-----   1 syslog  adm              2029328 Apr  9 12:57 messages.1
-rw-r-----   1 syslog  adm              2391470 Apr  9 12:57 syslog.1
-rw-r-----   1 syslog  adm                32106 Apr  9 12:57 auth.log.1
-rw-r-----   1 syslog  adm               826533 Apr  9 12:57 kern.log.1
-rw-rw----   1 root    utmp                   0 Apr  9 12:57 btmp
-rw-r--r--   1 root    root                   0 Apr  9 12:57 dpkg.log
drwxr-xr-x   2 root    root                4096 Apr  9 12:57 apt
-rw-r--r--   1 root    root                   0 Apr  9 12:57 alternatives.log
-rw-r-----   1 syslog  adm               240139 Apr  9 12:57 warn.1
-rw-rw-r--   1 root    utmp            18701432 Apr  9 12:57 lastlog
drwxr-sr-x+  5 root    systemd-journal     4096 Apr  9 12:56 journal
-rw-r-----   1 root    adm                54699 Mar 19 18:48 dmesg.0
drwx------   2 root    root                4096 Mar 19 16:37 landscape
-rw-r-----   1 root    root                   0 Mar 19 16:37 blobfuse-flexvol-installer.log
-rw-r-----   1 root    root                   0 Mar 19 16:37 blobfuse-driver.log
-rw-r-----   1 root    root                   0 Mar 19 16:37 kv-driver.log
-rw-r-----   1 root    root                   0 Mar 19 16:37 azure-cnimonitor.log
-rw-r-----   1 root    root              173758 Mar 19 16:37 dpkg.log.1
-rw-r-----   1 root    root                   0 Mar 19 16:37 daemon.log
-rw-r-----   1 root    root               10655 Mar 19 16:37 alternatives.log.1
-rw-r-----   1 syslog  adm                28353 Mar 19 16:37 localmessages
-rw-r-----   1 root    root              166508 Mar 19 16:37 ubuntu-advantage.log.1
-rw-r-----   1 root    adm                15934 Mar 19 16:34 dmesg.1.gz
-rw-r-----   1 root    adm                14393 Mar 19 15:52 dmesg.2.gz
-rw-r-----   1 root    adm                12796 Mar 19 15:45 dmesg.3.gz
drwx------   2 root    root                4096 Mar 19 15:44 private
-rw-rw----   1 root    utmp                   0 Feb 29 21:56 btmp.1
drwxrws--T   2 ceph    ceph                4096 Jan 11 12:26 ceph
drwxr-xr-x   2 root    root                4096 Jun  5  2023 sysstat
drwxr-xr-x   2 root    root                4096 Mar 14  2023 dist-upgrade
drwxr-xr-x   2 _chrony _chrony             4096 Aug 25  2020 chrony
drwxr-xr-x   2 root    root                4096 Jan 25  2020 glusterfs

The closest thing is /var/log/blobfuse-driver.log, which as you can see, is empty. The only way I can get logs from blobfuse2 is by manually executing, blobfuse2 mount command on the csi-node pod. Which then generated the blobfuse file, which helped me identify the issue, which there is more info about, on the blobfuse repo.

edit: find / -name "blobfuse2.log" does not return any result.

@fabio-s-franco from the csi driver logs on that node, I don't see there is any mount activity logs, is that the right node?

@fabio-s-franco
Copy link
Author

@andyzhangx You are right, the node where the container was being mounted changed. I can now find these files:

-rw-r-----   1 syslog  adm                    0 Apr  9 16:00 blobfuse2.log
-rw-r-----   1 syslog  adm                 1458 Apr  9 15:51 blobfuse2.log-20240409

But that accounts only for a small part of the problem (log-wise):

1 - The documentation does not detail the nuances of HNS enabled mounts, via private endpoint. Even when the parameter networkEndpointType=privatEndpoint, it does not work when HNS is enabled (which requires creating an additional private endpoint, for DFS subresource). Both kubelet and control plane user assigned identities had the contributor role over the entire subscription (to make sure that was not the problem). And in my case, the creation of the private endpoint is done in terraform because they are needed for other purposes, so I don't really want the driver to create it for me.
In short, making the HNS enabled mount with private endpoints using the driver is very difficult to do due to lack (or I could not find) of examples or details in the docs.

2 - Debugging documentation is not clear about some points (like shell into the aks node, instead of the blob container within csi-blob-node pod). I was only able to get it to work after understanding blobfuse2 and running commands there directly, verifying its logs until I could mount it (after seeing it tries to use .dfs endpoint when --use-adls=true). The logs when ran from there have much more than what I could find on the node agent. All I get from the node agent is:

root@aks-*******-********-vmss000001:/var/log# cat ./blobfuse2.log-20240409 
Apr  9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_ERR [root.go (149)]: beginDetectNewVersion: error parsing remoteVersion [invalid version string]
Apr  9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_WARNING [mount.go (384)]: mount: unsupported v1 CLI parameter: pre-mount-validate is always true in blobfuse2.
Apr  9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_CRIT [mount.go (404)]: Starting Blobfuse2 Mount : 2.2.1 on [Ubuntu 22.04.4 LTS]
Apr  9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_CRIT [mount.go (405)]: Logging level set to : LOG_WARNING
Apr  9 15:51:32 aks-*******-********-vmss000001 blobfuse2[16433]: LOG_ERR [file_cache.go (256)]: FileCache: config error [tmp-path does not exist. attempting to create tmp-path.]
Apr  9 15:51:33 aks-*******-********-vmss000001 blobfuse2[16442]: LOG_ERR [root.go (149)]: beginDetectNewVersion: error parsing remoteVersion [invalid version string]
Apr  9 15:51:33 aks-*******-********-vmss000001 blobfuse2[16442]: LOG_WARNING [mount.go (384)]: mount: unsupported v1 CLI parameter: pre-mount-validate is always true in blobfuse2.
Apr  9 15:51:33 aks-*******-********-vmss000001 blobfuse2[16442]: LOG_CRIT [mount.go (404)]: Starting Blobfuse2 Mount : 2.2.1 on [Ubuntu 22.04.4 LTS]
Apr  9 15:51:33 aks-*******-********-vmss000001 blobfuse2[16442]: LOG_CRIT [mount.go (405)]: Logging level set to : LOG_WARNING
root@aks-*******-********-vmss000001:/var/log#

And the matching logs on daemonset:

I0409 15:51:31.447241    5631 utils.go:76] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o negative_timeout=120","--virtual-directory=true","-o allow_other","--use-adls=true","-o attr_timeout=120","-o entry_timeout=120"]}},"access_mode":{"mode":5}},"volume_context":{"containerName":"***-***","getLatestAccountKey":"true","isHnsEnabled":"true","networkEndpointType":"privateEndpoint","protocol":"fuse2","resourceGroup":"rg-****-****-****-euw","server":"*******.blob.core.windows.net","storageAccount":"*******","useDataPlaneAPI":"true"},"volume_id":"**************"}
I0409 15:51:31.449233    5631 blob.go:386] parsing volumeID(**************) return with error: error parsing volume id: "**************", should at least contain two #
I0409 15:51:31.449255    5631 blob.go:458] volumeID(**************) authEnv: []
I0409 15:51:31.575866    5631 blob.go:513] get account(*******) key from secret(default, azure-storage-account-*******-secret) failed with error: could not get secret(azure-storage-account-*******-secret): secrets "azure-storage-account-*******-secret" not found, use cluster identity to get account key instead
I0409 15:51:31.814590    5631 azure_storageaccount.go:163] got storage account key with creation time: 2024-02-26 14:59:30.7437699 +0000 UTC
I0409 15:51:31.814671    5631 nodeserver.go:366] target /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount
protocol fuse2

volumeId **************
context map[containerName:***-*** getLatestAccountKey:true isHnsEnabled:true networkEndpointType:privateEndpoint protocol:fuse2 resourceGroup:rg-****-****-****-euw server:*******.blob.core.windows.net storageAccount:******* useDataPlaneAPI:true]
mountflags [-o negative_timeout=120 --virtual-directory=true -o allow_other --use-adls=true -o attr_timeout=120 -o entry_timeout=120]
mountOptions [--use-adls=true --virtual-directory=true -o allow_other -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/************** --container-name=***-*** --pre-mount-validate=true --use-https=true]
args /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount --use-adls=true --virtual-directory=true -o allow_other -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/************** --container-name=***-*** --pre-mount-validate=true --use-https=true
serverAddress *******.blob.core.windows.net
I0409 15:51:31.814749    5631 nodeserver.go:154] start connecting to blobfuse proxy, protocol: fuse2, args: /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount --use-adls=true --virtual-directory=true -o allow_other -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/************** --container-name=***-*** --pre-mount-validate=true --use-https=true
I0409 15:51:31.819421    5631 nodeserver.go:163] begin to mount with blobfuse proxy, protocol: fuse2, args: /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount --use-adls=true --virtual-directory=true -o allow_other -o attr_timeout=120 -o entry_timeout=120 -o negative_timeout=120 --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path=/mnt/************** --container-name=***-*** --pre-mount-validate=true --use-https=true
I0409 15:51:33.394180    5631 mount_linux.go:284] Detected umount with safe 'not mounted' behavior
I0409 15:51:33.394360    5631 nodeserver.go:624] blobfuse mount at /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount success
I0409 15:51:33.394387    5631 nodeserver.go:424] volume(**************) mount on "/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount" succeeded
I0409 15:51:33.394403    5631 utils.go:82] GRPC response: {}
I0409 15:51:33.397375    5631 utils.go:75] GRPC call: /csi.v1.Node/NodePublishVolume
I0409 15:51:33.397386    5631 utils.go:76] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount","target_path":"/var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o negative_timeout=120","--virtual-directory=true","-o allow_other","--use-adls=true","-o attr_timeout=120","-o entry_timeout=120"]}},"access_mode":{"mode":5}},"volume_context":{"containerName":"***-***","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"*******-*******-z6gvw","csi.storage.k8s.io/pod.namespace":"***","csi.storage.k8s.io/pod.uid":"5f19c464-be92-47d0-891c-028a66873c1b","csi.storage.k8s.io/serviceAccount.name":"***","getLatestAccountKey":"true","isHnsEnabled":"true","networkEndpointType":"privateEndpoint","protocol":"fuse2","resourceGroup":"rg-****-****-****-euw","server":"*******.blob.core.windows.net","storageAccount":"*******","useDataPlaneAPI":"true"},"volume_id":"**************"}
I0409 15:51:33.398065    5631 nodeserver.go:127] NodePublishVolume: volume ************** mounting /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount at /var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount with mountOptions: [bind]
I0409 15:51:33.398168    5631 mount_linux.go:245] Detected OS without systemd
I0409 15:51:33.398173    5631 mount_linux.go:220] Mounting cmd (mount) with arguments ( -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount /var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount)
I0409 15:51:33.408532    5631 mount_linux.go:220] Mounting cmd (mount) with arguments ( -o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount /var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount)
I0409 15:51:33.409917    5631 nodeserver.go:143] NodePublishVolume: volume ************** mount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount at /var/lib/kubelet/pods/5f19c464-be92-47d0-891c-028a66873c1b/volumes/kubernetes.io~csi/pv-blob*******/mount successfully
I0409 15:51:33.409931    5631 utils.go:82] GRPC response: {}
I0409 15:51:33.435282    5631 utils.go:75] GRPC call: /csi.v1.Node/NodePublishVolume
I0409 15:51:33.435303    5631 utils.go:76] GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount","target_path":"/var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount","volume_capability":{"AccessType":{"Mount":{"mount_flags":["-o negative_timeout=120","--virtual-directory=true","-o allow_other","--use-adls=true","-o attr_timeout=120","-o entry_timeout=120"]}},"access_mode":{"mode":5}},"volume_context":{"containerName":"***-***","csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"***-***-f4d754c8b-8qtnt","csi.storage.k8s.io/pod.namespace":"***","csi.storage.k8s.io/pod.uid":"66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1","csi.storage.k8s.io/serviceAccount.name":"***","getLatestAccountKey":"true","isHnsEnabled":"true","networkEndpointType":"privateEndpoint","protocol":"fuse2","resourceGroup":"rg-****-****-****-euw","server":"*******.blob.core.windows.net","storageAccount":"*******","useDataPlaneAPI":"true"},"volume_id":"**************"}
I0409 15:51:33.435922    5631 nodeserver.go:127] NodePublishVolume: volume ************** mounting /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount at /var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount with mountOptions: [bind]
I0409 15:51:33.435950    5631 mount_linux.go:220] Mounting cmd (mount) with arguments ( -o bind /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount /var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount)
I0409 15:51:33.440429    5631 mount_linux.go:220] Mounting cmd (mount) with arguments ( -o bind,remount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount /var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount)
I0409 15:51:33.441626    5631 nodeserver.go:143] NodePublishVolume: volume ************** mount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/******************************/globalmount at /var/lib/kubelet/pods/66eb6a6d-8a8e-4c73-95d3-05e2ced9b4a1/volumes/kubernetes.io~csi/pv-blob*******/mount successfully
I0409 15:51:33.441643    5631 utils.go:82] GRPC response: {}

An example I posted on blobfuse2 issue:

===== RESPONSE ERROR (ServiceCode=AuthorizationFailure) =====
Description=403 This request is not authorized to perform this operation., Details: (none)
   GET https://*.dfs.core.windows.net/mycont?maxResults=2&recursive=false&resource=filesystem&timeout=901 # <<<<<<< Ifinally was able to resolve it, when I saw this
   Authorization: REDACTED
   User-Agent: [Azure-Storage-Fuse/2.1.2 (Debian GNU/Linux 12 (bookworm)) Azure-Storage/0.1 (go1.20.5; linux)]
   X-Ms-Client-Request-Id: [REDACTED]
   X-Ms-Date: [Fri, 05 Apr 2024 11:40:17 GMT]
   X-Ms-Version: [2018-11-09]
   --------------------------------------------------------------------------------
   RESPONSE Status: 403 This request is not authorized to perform this operation.
   Content-Length: [194]
   Content-Type: [application/json;charset=utf-8]
   Date: [Fri, 05 Apr 2024 11:40:17 GMT]
   Server: [Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0]
   X-Ms-Error-Code: [AuthorizationFailure]
   X-Ms-Request-Id: [REDACTED]
   X-Ms-Version: [2018-11-09]
]

So, to sum it up. This process could have been a lot friendlier. And I believe that can be mostly achieved via documentation with accompanying examples and an improved debugging experience, which in part is derived from the wiki debug steps that could use a bit more clarification.

I think that is it. My problem is already resolved, so I am here is just to provided some feedback to help the next warrior.

@andyzhangx
Copy link
Member

andyzhangx commented Apr 10, 2024

@fabio-s-franco I see you have set following mount options in the volume, so --use-adls=true with server: *******.blob.core.windows.net parameter solves your issue, right? and I don't think you need to set networkEndpointType: privateEndpoint since privateEndpoint is created by yourself, and you have already set server parameter

"containerName":"***-***","getLatestAccountKey":"true","isHnsEnabled":"true","networkEndpointType":"privateEndpoint","protocol":"fuse2","resourceGroup":"rg-****-****-****-euw","server":"*******.blob.core.windows.net","storageAccount":"*******","useDataPlaneAPI":"true"

@fabio-s-franco
Copy link
Author

fabio-s-franco commented Apr 11, 2024

@andyzhangx Hi, no these logs are a bit misleading.

When --use-adls=true .blob.core.windows.net endpoint does not work, and the logs are misleading in that case. Setting server to that URL has no effect. I encourage you to have a look at the issue Azure/azure-storage-fuse#1376, as I imagine all that does is set the AZURE_STORAGE_BLOB_ENDPOINT environment variable.

When calling the command with --use-adls=true, this environment variable is not used and the .dfs endpoint is used instead. I caught that by inspecting blobfuse2 logs when manually calling the command to mount the container. You need to setup a separate private endpoint targeting DFS subresouce to use HNS and this can only be noticed when public access is disabled.

It's correct, networkEndpointType does not need to be set, but was part of my trial-and-error debugging as there was no other way for me to do a more objective diagnosis. The documentation is not clear whether it is a requirement to route through private endpoint, or if it's just meant to create the private endpoint automatically (which would not work for HNS, if it doesn't create a private endpoint targetting DFS subresource).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants