fix(aws): aws check and metadata fixes #4251

mtronrd · 2024-06-14T00:56:27Z

Context

Some fixes to AWS checks and check metadata

Description

Fix metadata for ec2_ebs_volume_snapshots_exists so that SecurityHub findings are created against the volume resource. The resource id for the check is the volume but it was incorrectly identified as a snapshot.
Fix ec2_instance_managed_by_ssm check to exclude stopped instances. Stopped instances cannot report to SSM, and it doesn't make sense to flag them as unmanaged until they are started. The behavior in Prowler 2 was to only check running instances, here I'm passing pending, terminated, and stopped instances that can't be managed.
Fix metadata for sns_topics_kms_encryption_at_rest_enabled to use the correct ResourceType. SecurityHub is case sensitive.
Fix metadata for sns_topics_not_publicly_accessible to use the correct ResourceType. SecurityHub is case sensitive.
Mitigate pagination rate limit errors in large environments caused by ssm:DescribeInstanceInformation calls in ssm_service.py. Boto3 pagination does not successfully handle the rate limiting exceptions on this action, and a short sleep between pages is the best option I've found to make it work consistently in accounts with hundreds or thousands of instances. When these rate limit errors occur, any checks that depend on the ssm instance information such as ec2_instance_managed_by_ssm throw large numbers of false positives. The added delay does not seem to be significant and it addresses the false positives.

All code checks are passing for these changes.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

… in large environments

codecov · 2024-06-14T01:17:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.62%. Comparing base (60b3523) to head (41d222a).
Report is 27 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4251      +/-   ##
==========================================
- Coverage   86.67%   86.62%   -0.06%     
==========================================
  Files         818      818              
  Lines       25700    25712      +12     
==========================================
- Hits        22275    22272       -3     
- Misses       3425     3440      +15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jfagoagas

Thanks for the changes in metadata @mtronrd !!

Regarding the improvement in the check, please add the missing tests and let us know if you need help with that.

About the sleep, we need to explicitly handle the possibles exceptions raised there instead of just adding a little delay.

Thanks for using Prowler 🚀

jfagoagas · 2024-06-17T08:58:02Z

prowler/providers/aws/services/ec2/ec2_instance_managed_by_ssm/ec2_instance_managed_by_ssm.py

+            )
+            report.resource_id = instance.id
+            # instances not running should pass the check
+            if instance.state in ["pending", "terminated", "stopped"]:


Could you add some test to handle this new behaviour? One for each new state. Thanks!

New tests added for "running", "stopped", and "terminated". Moto doesn't appear to support mocking the temporary statuses like "pending" in a way that the prowler services can read.

jfagoagas · 2024-06-17T08:59:36Z

prowler/providers/aws/services/ssm/ssm_service.py

@@ -145,6 +146,7 @@ def __describe_instance_information__(self, regional_client):
                        id=resource_id,
                        region=regional_client.region,
                    )
+                time.sleep(0.1)


I'm not sure about including this here, it doesn't seem as something deterministic. If there are rate limiting errors we should handle that explicitly.

The boto3 paginator should be handling the throttling exceptions and retrying, but it doesn't work consistently for ssm:DescribeInstanceInformation and exhausts retries. We escalated this to AWS support and their recommendation was "Reduce the frequency of the API calls", which is why I've been using a short sleep between pages like this which has been working. The rate limits for this action are not documented and seem to behave differently from other AWS actions, and the problem is difficult to test for because it only shows up when you have large numbers of instances running, in our case ephemeral Spot processing jobs. When the problem occurs it generates large numbers of false positives because the ssm instance metadata is not captured and ec2_instance_managed_by_ssm fails for most of the instances, and we have not seen that happen after adding the sleep.

Could you add this message as a comment above the time sleep? It'd be great to have this.

Thanks for your analysis 👏

Good idea, done.

jfagoagas

Thanks for this contribution @mtronrd 👏

I've made some changes in the tests to follow or style doing assert just for consistency.

John Mastron added 4 commits June 13, 2024 13:23

correct resource types

a15a9f7

pass stopped instances for ec2_instance_managed_by_ssm

9ab2095

fix SubServiceName for ec2_ebs_volume_snapshots_exists

05b119b

slow ssm:DescribeInstanceInformation pagination to handle rate limits…

c90a00f

… in large environments

mtronrd requested review from a team as code owners June 14, 2024 00:56

github-actions bot added the provider/aws Issues/PRs related with the AWS provider label Jun 14, 2024

jfagoagas requested changes Jun 17, 2024

View reviewed changes

jfagoagas self-assigned this Jun 17, 2024

jfagoagas added the status/awaiting-reponse Waiting response from Issue owner label Jun 17, 2024

John Mastron added 2 commits June 17, 2024 17:49

add instance status tests for ec2_instance_managed_by_ssm

8a47995

comment reason for ssm_service sleep between pages

f08ecc0

jfagoagas removed the status/awaiting-reponse Waiting response from Issue owner label Jun 19, 2024

fix(test): Same style asserts

41d222a

jfagoagas self-requested a review June 19, 2024 06:38

jfagoagas approved these changes Jun 19, 2024

View reviewed changes

jfagoagas added the backport-v3 Pending to port to Prowler v3 branch label Jun 19, 2024

jfagoagas merged commit 9147a45 into prowler-cloud:master Jun 19, 2024
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(aws): aws check and metadata fixes #4251

fix(aws): aws check and metadata fixes #4251

mtronrd commented Jun 14, 2024 •

edited by jfagoagas

Loading

codecov bot commented Jun 14, 2024 •

edited

Loading

jfagoagas left a comment

jfagoagas Jun 17, 2024

mtronrd Jun 18, 2024

jfagoagas Jun 17, 2024

mtronrd Jun 17, 2024

jfagoagas Jun 18, 2024

mtronrd Jun 18, 2024

jfagoagas left a comment

fix(aws): aws check and metadata fixes #4251

fix(aws): aws check and metadata fixes #4251

Conversation

mtronrd commented Jun 14, 2024 • edited by jfagoagas Loading

Context

Description

License

codecov bot commented Jun 14, 2024 • edited Loading

Codecov Report

jfagoagas left a comment

Choose a reason for hiding this comment

jfagoagas Jun 17, 2024

Choose a reason for hiding this comment

mtronrd Jun 18, 2024

Choose a reason for hiding this comment

jfagoagas Jun 17, 2024

Choose a reason for hiding this comment

mtronrd Jun 17, 2024

Choose a reason for hiding this comment

jfagoagas Jun 18, 2024

Choose a reason for hiding this comment

mtronrd Jun 18, 2024

Choose a reason for hiding this comment

jfagoagas left a comment

Choose a reason for hiding this comment

mtronrd commented Jun 14, 2024 •

edited by jfagoagas

Loading

codecov bot commented Jun 14, 2024 •

edited

Loading