Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use workload identity instead of storage key? Identity not found? #945

Open
TeamDman opened this issue May 31, 2023 · 10 comments
Open

Use workload identity instead of storage key? Identity not found? #945

TeamDman opened this issue May 31, 2023 · 10 comments
Assignees
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@TeamDman
Copy link

Is your feature request related to a problem?/Why is this needed

I have a cluster where I have a k8s service account set up with Workload Identity.

https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview

The identity has Storage Blob Data Reader role, so it should be able to read the storage account?

However, it doesn't seem like this is a supported use case, instead I think we are required to create a k8s secret with a storage key?

https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md

Describe the solution you'd like in detail

I'd like this to work without needing to create a secret for the storage key, since I should be able to give the workload identity principle the necessary roles to access the storage

apiVersion: apps/v1
kind: Deployment
metadata:
  name: staticsite-dep
spec:
  revisionHistoryLimit: 3
  replicas: 1
  selector:
    matchLabels:
      app: staticsite-lbl
  template:
    metadata:
      labels:
        app: staticsite-lbl
    spec:
      nodeSelector:
        kubernetes.io/os: linux
      containers:
        - name: staticsite-cont
          image: nginxinc/nginx-unprivileged:1.24-bullseye-perl
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 250m
              memory: 256Mi
          ports:
            - containerPort: 8080
          securityContext: # https://kubernetes.io/docs/concepts/security/pod-security-standards/
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1001
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
                - NET_RAW
            seLinuxOptions:
              type: container_t
            seccompProfile:
              type: RuntimeDefault
          volumeMounts:
            - mountPath: /usr/share/nginx/html
              name: persistent-storage
            - mountPath: /tmp
              name: temp-storage
            # - name: nginx-conf
            #   mountPath: /etc/nginx/nginx.conf
            #   subPath: nginx.conf
            #   readOnly: true
      volumes:
        # https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/deploy/example/e2e_usage.md
        - name: temp-storage
          emptyDir: {}
        - name: persistent-storage
          csi:
            driver: blob.csi.azure.com
            volumeAttributes:
              containerName: webcontent
              # https://learn.microsoft.com/en-us/azure/aks/azure-csi-blob-storage-provision?tabs=mount-nfs%2Csecret#static-provisioning-parameters
              mountOptions: "-o allow_other --file-cache-timeout-in-seconds=120"
              resourceGroup: my-RGP
              storageAccount: staticsitedemo
              AzureStorageAuthType: msi
              AzureStorageIdentityClientId: 00000-000000000-0000000000-00000 # managed identity matches service account
              # https://github.com/kubernetes-sigs/blob-csi-driver/issues/618
              # https://github.com/Azure/azure-storage-fuse#environment-variables
              # secretName: staticsite-storagekey
        # - name: nginx-conf
        #   configMap:
        #     name: nginx-conf
        #     items:
        #       - key: nginx.conf
        #         path: nginx.conf
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: workload-identity-staticsite
  labels:
    azure.workload.identity/use: "true"
  annotations:
    # from terraform output
    azure.workload.identity/client-id: 00000000-0000000-0000000-0000000
    azure.workload.identity/tenant-id: 111111111-111111111-11111111-1111111
automountServiceAccountToken: false

Describe alternatives you've considered

Additional context

The above example errors with the following

MountVolume.SetUp failed for volume "persistent-storage" : rpc error: code = Internal desc = Mount failed with error: exit status 255, output: OAUTH Token : Refresh token failed Failed to retrieve OAuth Token from IMDS endpoint (CURLCode: 0, HTTP code: 400): {"error":"invalid_request","error_description":"Identity not found"}Unable to retrieve OAuth token: Failed to retrieve OAuth Token from IMDS endpoint (CURLCode: 0, HTTP code: 400): {"error":"invalid_request","error_description":"Identity not found"} Unable to start blobfuse due to authentication or connectivity issues. Please check the readme for valid auth setups. no config filedone reading env varsURI token request URL printed out http://redacted/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=redacted&resource=https://storage.azure.com/

I tried adding the object id in case that helped, but it seems that I was right the first time and only the client ID should need to be specified

MountVolume.SetUp failed for volume "persistent-storage" : rpc error: code = Internal desc = Mount failed with error: exit status 255, output: OAUTH Token : Refresh token failed Failed to retrieve OAuth Token from IMDS endpoint (CURLCode: 0, HTTP code: 400): {"error":"invalid_request","error_description":"Only one of 'client_id', 'object_id', 'principal_id', or 'mi_res_id' may be provided"}Unable to retrieve OAuth token: Failed to retrieve OAuth Token from IMDS endpoint (CURLCode: 0, HTTP code: 400): {"error":"invalid_request","error_description":"Only one of 'client_id', 'object_id', 'principal_id', or 'mi_res_id' may be provided"} Unable to start blobfuse due to authentication or connectivity issues. Please check the readme for valid auth setups. no config filedone reading env varsURI token request URL printed out http://redac/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=redac&object_id=redac&resource=https://storage.azure.com/
@andyzhangx
Copy link
Member

what blob csi driver version are you using? is it managed by AKS? we are not releasing the workload identity support release yet.

@TeamDman
Copy link
Author

TeamDman commented Jun 1, 2023

I believe it's the AKS managed one, will follow up Tuesday when I'm back at the console

https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster#blob_driver_enabled

@andyzhangx
Copy link
Member

andyzhangx commented Jun 2, 2023

@TeamDman the managed blob csi driver does not support workload identity, it supports managed identity instead, check details here: https://github.com/kubernetes-sigs/blob-csi-driver/blob/master/docs/workload-identity.md

in your case, you could remove AzureStorageAuthType, and AzureStorageIdentityClientId parameters in volumeAttributes, and then grant the managed identity for the kubelet on the agent node read & write access to the storage account, it should work.

you could also set AzureStorageAuthType: msi and set a correct AzureStorageIdentityClientId value, then the blobfuse mount would use the managed identity you assigned to mount directly.

@TeamDman
Copy link
Author

TeamDman commented Jun 7, 2023

I hesitate to grant the kubelet identity the perms since that grants the whole cluster access. I'm setting up a shared environment where multiple teams will be using a cluster separated by namespaces; the goal is to be able to use managed identities so I'll check out the second part you linked!

Node pools Kubernetes versions
1.23.12

Node sizes
Standard_D2s_v3

Cluster Configuration
Kubernetes version
1.24.9

@TeamDman
Copy link
Author

TeamDman commented Jun 14, 2023

Here's what I'm using

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

helmCharts:
- name: blob-csi-driver
  repo: https://raw.githubusercontent.com/kubernetes-sigs/blob-csi-driver/master/charts
  version: v1.18.0
  namespace: kube-system
  releaseName: blob-csi-driver
  valuesInline:
    controller:
      replicas: 1
    node.enableBlobfuseProxy: true
    cloud: AzureStackCloud

Will follow up once I get the time to try the msi auth type.

@andyzhangx
Copy link
Member

@TeamDman msi is managed identity, and it can only be assigned to node level. if you want namespace isolation identity, account key stored as k8s secret is the only way now.

@TeamDman
Copy link
Author

Thanks for the clarification, will wait for GA release of the full support then <3

@andyzhangx
Copy link
Member

/assign @cvvz

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2024
@andyzhangx andyzhangx removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 26, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

5 participants