-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(aws): aws check and metadata fixes #4251
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4251 +/- ##
==========================================
- Coverage 86.67% 86.62% -0.06%
==========================================
Files 818 818
Lines 25700 25712 +12
==========================================
- Hits 22275 22272 -3
- Misses 3425 3440 +15 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes in metadata @mtronrd !!
Regarding the improvement in the check, please add the missing tests and let us know if you need help with that.
About the sleep, we need to explicitly handle the possibles exceptions raised there instead of just adding a little delay.
Thanks for using Prowler 🚀
) | ||
report.resource_id = instance.id | ||
# instances not running should pass the check | ||
if instance.state in ["pending", "terminated", "stopped"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some test to handle this new behaviour? One for each new state. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New tests added for "running", "stopped", and "terminated". Moto doesn't appear to support mocking the temporary statuses like "pending" in a way that the prowler services can read.
@@ -145,6 +146,7 @@ def __describe_instance_information__(self, regional_client): | |||
id=resource_id, | |||
region=regional_client.region, | |||
) | |||
time.sleep(0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about including this here, it doesn't seem as something deterministic. If there are rate limiting errors we should handle that explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boto3 paginator should be handling the throttling exceptions and retrying, but it doesn't work consistently for ssm:DescribeInstanceInformation and exhausts retries. We escalated this to AWS support and their recommendation was "Reduce the frequency of the API calls", which is why I've been using a short sleep between pages like this which has been working. The rate limits for this action are not documented and seem to behave differently from other AWS actions, and the problem is difficult to test for because it only shows up when you have large numbers of instances running, in our case ephemeral Spot processing jobs. When the problem occurs it generates large numbers of false positives because the ssm instance metadata is not captured and ec2_instance_managed_by_ssm fails for most of the instances, and we have not seen that happen after adding the sleep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this message as a comment above the time sleep? It'd be great to have this.
Thanks for your analysis 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution @mtronrd 👏
I've made some changes in the tests to follow or style doing assert just for consistency.
Context
Some fixes to AWS checks and check metadata
Description
ec2_ebs_volume_snapshots_exists
so that SecurityHub findings are created against the volume resource. The resource id for the check is the volume but it was incorrectly identified as a snapshot.ec2_instance_managed_by_ssm
check to exclude stopped instances. Stopped instances cannot report to SSM, and it doesn't make sense to flag them as unmanaged until they are started. The behavior in Prowler 2 was to only check running instances, here I'm passing pending, terminated, and stopped instances that can't be managed.sns_topics_kms_encryption_at_rest_enabled
to use the correct ResourceType. SecurityHub is case sensitive.sns_topics_not_publicly_accessible
to use the correct ResourceType. SecurityHub is case sensitive.ssm:DescribeInstanceInformation
calls inssm_service.py
. Boto3 pagination does not successfully handle the rate limiting exceptions on this action, and a short sleep between pages is the best option I've found to make it work consistently in accounts with hundreds or thousands of instances. When these rate limit errors occur, any checks that depend on the ssm instance information such asec2_instance_managed_by_ssm
throw large numbers of false positives. The added delay does not seem to be significant and it addresses the false positives.All code checks are passing for these changes.
License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.