When a resource is deleted, a delete report of some other resource is sent instead of the deleted resource. #2110

tokyowizard · 2024-05-29T06:45:29Z

What steps did you take and what happened:

When setting operator.webhookSendDeletedReports: true and deleting a kubernetes resource, a delete report of some other resource, which still exists, is sent instead of a report of the deleted resource.

Here's a summary of the commands:

Start up a kubernetes cluster using Docker Desktop. Install Docker Desktop and check "Enable Kubernetes" in the settings.

Install the trivy-operator using the helm chart.

In the values.yaml, enable the webhook and sending deleted reports. And just enable getting the Vulnerability reports.

values.yaml

Replace LOCAL_IP with your local IP address. (e.g. On MacOS: ifconfig -l | xargs -n 1 ipconfig getifaddr)

operator:
  namespace: "trivy-system"
  configAuditScannerEnabled: false
  rbacAssessmentScannerEnabled: false
  infraAssessmentScannerEnabled: false
  clusterComplianceEnabled: false
  exposedSecretScannerEnabled: false
  metricsVulnIdEnabled: false
  webhookBroadcastURL: http://LOCAL_IP:8080
  webhookSendDeletedReports: true
trivy:
  mode: ClientServer
  ignoreUnfixed: true

trivy-server is running separately as standalone in the cluster.

Start up a webhook server to receive and view the payload of the deleted reports that were sent.

python3 -m mock_server.py

mock_server.py. Python code for the webhook server (tested with Python 3.11 and 3.12)

import http.server
import json
import socketserver

class WebhookHandler(http.server.BaseHTTPRequestHandler):
    def do_POST(self):
        content_length = int(self.headers['Content-Length'])
        post_data = self.rfile.read(content_length)
        payload = json.loads(post_data.decode('utf-8'))

        print("Received Webhook Payload:")
        print(json.dumps(payload, indent=4, sort_keys=True))

        self.send_response(200)
        self.send_header('Content-type', 'text/plain')
        self.end_headers()
        self.wfile.write(b'Webhook received successfully')

PORT = 8080

with socketserver.TCPServer(("", PORT), WebhookHandler) as httpd:
    print(f"Serving at port {PORT}")
    httpd.serve_forever()

Apply a couple of jobs to the cluster.

kubectl apply -f jobs.yaml

jobs.yaml

---
apiVersion: batch/v1
kind: Job
metadata:
  name: pod-with-vulnerabilities1
spec:
  template:
    spec:
      containers:
        - name: agent
          image: datadog/agent:7.50.3
          imagePullPolicy: IfNotPresent
          command:
            - /bin/sh
            - -c
          args:
            - date; echo " do nothing..."
      restartPolicy: Never
---
apiVersion: batch/v1
kind: Job
metadata:
  name: pod-with-vulnerabilities2
spec:
  template:
    spec:
      containers:
        - name: agent
          image: datadog/agent:7.40.0
          imagePullPolicy: IfNotPresent
          command:
            - /bin/sh
            - -c
          args:
            - date; echo " do nothing..."
      restartPolicy: Never

Delete one of the jobs.

 kubectl delete job pod-with-vulnerabilities1

Check the logs of the webhook server to see the payload of the deleted report.

A payload of the deleted report was for some other pod instead of the pod-with-vulnerabilities1 job.

What did you expect to happen:

I expected a report of the deleted job to have been sent.

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

Environment:

Trivy-Operator version (use trivy-operator version): 0.21.1
Kubernetes version (use kubectl version): v1.29.2
OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): macOS 14.4.1

The text was updated successfully, but these errors were encountered:

chen-keinan · 2024-06-20T10:37:39Z

@tokyowizard when report is delete it will get generated again if the resource which report was generated for still exist.
the report and resource are tied with parent child relation

eri-plint · 2024-06-24T14:04:04Z

I experience something similar as well. Sometimes delete is sent for resources that have not actually been deleted. Updates also seem to be sent for resources that are deleted.

Example logs from my webhook server:

{"time": "2024-06-19 02:26:47", "level":"information", "message": "Received update request for vulnerabilityreports job-eris-application-lpsq6-eris-application" }
{"time": "2024-06-19 02:26:47", "level":"information", "message": "No existing record found for vulnerabilityreports job-eris-application-lpsq6-eris-application (uid e1336e88-782e-4cf5-b594-654c8a49eaaf), creating new record." }
{"time": "2024-06-19 02:27:04", "level":"information", "message": "Received delete request for vulnerabilityreports job-eris-application-lpsq6-eris-application" }
{"time": "2024-06-19 02:27:04", "level":"information", "message": "Deleting vulnerabilityreports job-eris-application-lpsq6-eris-application with id 5a0e" }
{"time": "2024-06-19 02:43:31", "level":"information", "message": "Received update request for vulnerabilityreports job-eris-application-lpsq6-eris-application" }
{"time": "2024-06-19 02:43:31", "level":"information", "message": "No existing record found for vulnerabilityreports job-eris-application-lpsq6-eris-application (uid e1336e88-782e-4cf5-b594-654c8a49eaaf), creating new record." }

chen-keinan · 2024-06-25T12:09:14Z

@eri-plint events are send for deleted report and not resources.
Note: reports has TTL so it deleted and then recreated.

eri-plint · 2024-06-25T16:08:56Z

@chen-keinan I think I understand that much. But for the report TTL, it is set for 24h, why would it be recreated after only ~20 minutes? I would buy that specific instance was just unlucky timing and not be related the issue with the delete requests being sent for the wrong reports.

And I should maybe have been more clear that I mean the owning resources. For example I delete job A, but the delete request is sent for the report of job B, although job B (and it's associated report resources) still remain.

I also noted that multiple delete requests seem to be sent for the same resource in a somewhat short succession, and often shortly after it is created. I don't know if that is relevant though, but it typically looks like the log below. In this case it was not the job foobar-htbzj that was deleted, the requests were sent when another job was deleted which is what I assume @tokyowizard described.

2024-06-25 15:10:42.884 {"level":"information","message":"Received update request","fields":{"verb":"update","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}
2024-06-25 15:12:58.324 {"level":"information","message":"Received delete request","fields":{"verb":"delete","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}
2024-06-25 15:16:32.090 {"level":"information","message":"Received delete request","fields":{"verb":"delete","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}
2024-06-25 15:16:48.028 {"level":"information","message":"Received delete request","fields":{"verb":"delete","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}
2024-06-25 15:20:28.303 {"level":"information","message":"Received delete request","fields":{"verb":"delete","kind":"vulnerabilityreports","name":"job-foobar-htbzj-foobar","uid":"d58d11fd-9207-4d97-9837-71fed8b9cdc8"}}

chen-keinan · 2024-06-26T05:54:30Z

@eri-plint I'm not in details on how your prod. env operate, however I can explain how it works on the operator side and maybe it will help for you realize if we are missing anything is details or there is a bug in operatorץ

TTL is configure (by default) every 24 hour so as you mention it get deleted once a day for the report to be updated with latest VulnDB
in Addition report TTL can be set to 0 meaning immediate deletion, upon deployment scaling down and up and also on rollout or upgrade of a Pod, this is because by default operator do not keep historical reports.

tokyowizard added the kind/bug Categorizes issue or PR as related to a bug. label May 29, 2024

chen-keinan added priority/backlog Higher priority than priority/awaiting-more-evidence. target/kubernetes Issues relating to kubernetes cluster scanning labels May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When a resource is deleted, a delete report of some other resource is sent instead of the deleted resource. #2110

When a resource is deleted, a delete report of some other resource is sent instead of the deleted resource. #2110

tokyowizard commented May 29, 2024

chen-keinan commented Jun 20, 2024

eri-plint commented Jun 24, 2024

chen-keinan commented Jun 25, 2024 •

edited

Loading

eri-plint commented Jun 25, 2024

chen-keinan commented Jun 26, 2024

When a resource is deleted, a delete report of some other resource is sent instead of the deleted resource. #2110

When a resource is deleted, a delete report of some other resource is sent instead of the deleted resource. #2110

Comments

tokyowizard commented May 29, 2024

chen-keinan commented Jun 20, 2024

eri-plint commented Jun 24, 2024

chen-keinan commented Jun 25, 2024 • edited Loading

eri-plint commented Jun 25, 2024

chen-keinan commented Jun 26, 2024

chen-keinan commented Jun 25, 2024 •

edited

Loading