Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a resource is deleted, a delete report of some other resource is sent instead of the deleted resource. #2110

Open
tokyowizard opened this issue May 29, 2024 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. target/kubernetes Issues relating to kubernetes cluster scanning

Comments

@tokyowizard
Copy link

What steps did you take and what happened:

When setting operator.webhookSendDeletedReports: true and deleting a kubernetes resource, a delete report of some other resource, which still exists, is sent instead of a report of the deleted resource.

Here's a summary of the commands:

  1. Start up a kubernetes cluster using Docker Desktop. Install Docker Desktop and check "Enable Kubernetes" in the settings.

  2. Install the trivy-operator using the helm chart.

    • In the values.yaml, enable the webhook and sending deleted reports. And just enable getting the Vulnerability reports.

      values.yaml
      • Replace LOCAL_IP with your local IP address. (e.g. On MacOS: ifconfig -l | xargs -n 1 ipconfig getifaddr)
      operator:
        namespace: "trivy-system"
        configAuditScannerEnabled: false
        rbacAssessmentScannerEnabled: false
        infraAssessmentScannerEnabled: false
        clusterComplianceEnabled: false
        exposedSecretScannerEnabled: false
        metricsVulnIdEnabled: false
        webhookBroadcastURL: http://LOCAL_IP:8080
        webhookSendDeletedReports: true
      trivy:
        mode: ClientServer
        ignoreUnfixed: true

    trivy-server is running separately as standalone in the cluster.

  3. Start up a webhook server to receive and view the payload of the deleted reports that were sent.

    python3 -m mock_server.py
    mock_server.py. Python code for the webhook server (tested with Python 3.11 and 3.12)
    import http.server
    import json
    import socketserver
    
    class WebhookHandler(http.server.BaseHTTPRequestHandler):
        def do_POST(self):
            content_length = int(self.headers['Content-Length'])
            post_data = self.rfile.read(content_length)
            payload = json.loads(post_data.decode('utf-8'))
    
            print("Received Webhook Payload:")
            print(json.dumps(payload, indent=4, sort_keys=True))
    
            self.send_response(200)
            self.send_header('Content-type', 'text/plain')
            self.end_headers()
            self.wfile.write(b'Webhook received successfully')
    
    PORT = 8080
    
    with socketserver.TCPServer(("", PORT), WebhookHandler) as httpd:
        print(f"Serving at port {PORT}")
        httpd.serve_forever()
  4. Apply a couple of jobs to the cluster.

    kubectl apply -f jobs.yaml
    jobs.yaml
    ---
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: pod-with-vulnerabilities1
    spec:
      template:
        spec:
          containers:
            - name: agent
              image: datadog/agent:7.50.3
              imagePullPolicy: IfNotPresent
              command:
                - /bin/sh
                - -c
              args:
                - date; echo " do nothing..."
          restartPolicy: Never
    ---
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: pod-with-vulnerabilities2
    spec:
      template:
        spec:
          containers:
            - name: agent
              image: datadog/agent:7.40.0
              imagePullPolicy: IfNotPresent
              command:
                - /bin/sh
                - -c
              args:
                - date; echo " do nothing..."
          restartPolicy: Never
  5. Delete one of the jobs.

     kubectl delete job pod-with-vulnerabilities1
  6. Check the logs of the webhook server to see the payload of the deleted report.

A payload of the deleted report was for some other pod instead of the pod-with-vulnerabilities1 job.

What did you expect to happen:

I expected a report of the deleted job to have been sent.

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Trivy-Operator version (use trivy-operator version): 0.21.1
  • Kubernetes version (use kubectl version): v1.29.2
  • OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): macOS 14.4.1
@tokyowizard tokyowizard added the kind/bug Categorizes issue or PR as related to a bug. label May 29, 2024
@chen-keinan chen-keinan added priority/backlog Higher priority than priority/awaiting-more-evidence. target/kubernetes Issues relating to kubernetes cluster scanning labels May 29, 2024
@chen-keinan
Copy link
Collaborator

@tokyowizard when report is delete it will get generated again if the resource which report was generated for still exist.
the report and resource are tied with parent child relation

@eri-plint
Copy link

I experience something similar as well. Sometimes delete is sent for resources that have not actually been deleted. Updates also seem to be sent for resources that are deleted.

Example logs from my webhook server:

{"time": "2024-06-19 02:26:47", "level":"information", "message": "Received update request for vulnerabilityreports job-eris-application-lpsq6-eris-application" }
{"time": "2024-06-19 02:26:47", "level":"information", "message": "No existing record found for vulnerabilityreports job-eris-application-lpsq6-eris-application (uid e1336e88-782e-4cf5-b594-654c8a49eaaf), creating new record." }
{"time": "2024-06-19 02:27:04", "level":"information", "message": "Received delete request for vulnerabilityreports job-eris-application-lpsq6-eris-application" }
{"time": "2024-06-19 02:27:04", "level":"information", "message": "Deleting vulnerabilityreports job-eris-application-lpsq6-eris-application with id 5a0e" }
{"time": "2024-06-19 02:43:31", "level":"information", "message": "Received update request for vulnerabilityreports job-eris-application-lpsq6-eris-application" }
{"time": "2024-06-19 02:43:31", "level":"information", "message": "No existing record found for vulnerabilityreports job-eris-application-lpsq6-eris-application (uid e1336e88-782e-4cf5-b594-654c8a49eaaf), creating new record." }

@chen-keinan
Copy link
Collaborator

chen-keinan commented Jun 25, 2024

@eri-plint events are send for deleted report and not resources.
Note: reports has TTL so it deleted and then recreated.

@eri-plint
Copy link

@chen-keinan I think I understand that much. But for the report TTL, it is set for 24h, why would it be recreated after only ~20 minutes? I would buy that specific instance was just unlucky timing and not be related the issue with the delete requests being sent for the wrong reports.

And I should maybe have been more clear that I mean the owning resources. For example I delete job A, but the delete request is sent for the report of job B, although job B (and it's associated report resources) still remain.

I also noted that multiple delete requests seem to be sent for the same resource in a somewhat short succession, and often shortly after it is created. I don't know if that is relevant though, but it typically looks like the log below. In this case it was not the job foobar-htbzj that was deleted, the requests were sent when another job was deleted which is what I assume @tokyowizard described.

2024-06-25 15:10:42.884 {"level":"information","message":"Received update request","fields":{"verb":"update","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}
2024-06-25 15:12:58.324 {"level":"information","message":"Received delete request","fields":{"verb":"delete","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}
2024-06-25 15:16:32.090 {"level":"information","message":"Received delete request","fields":{"verb":"delete","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}
2024-06-25 15:16:48.028 {"level":"information","message":"Received delete request","fields":{"verb":"delete","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}
2024-06-25 15:20:28.303 {"level":"information","message":"Received delete request","fields":{"verb":"delete","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}

@chen-keinan
Copy link
Collaborator

@eri-plint I'm not in details on how your prod. env operate, however I can explain how it works on the operator side and maybe it will help for you realize if we are missing anything is details or there is a bug in operatorץ

  • TTL is configure (by default) every 24 hour so as you mention it get deleted once a day for the report to be updated with latest VulnDB
  • in Addition report TTL can be set to 0 meaning immediate deletion, upon deployment scaling down and up and also on rollout or upgrade of a Pod, this is because by default operator do not keep historical reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. target/kubernetes Issues relating to kubernetes cluster scanning
Projects
None yet
Development

No branches or pull requests

3 participants