Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(aws): aws check and metadata fixes #4251

Merged
merged 7 commits into from
Jun 19, 2024

Conversation

mtronrd
Copy link
Contributor

@mtronrd mtronrd commented Jun 14, 2024

Context

Some fixes to AWS checks and check metadata

Description

  • Fix metadata for ec2_ebs_volume_snapshots_exists so that SecurityHub findings are created against the volume resource. The resource id for the check is the volume but it was incorrectly identified as a snapshot.
  • Fix ec2_instance_managed_by_ssm check to exclude stopped instances. Stopped instances cannot report to SSM, and it doesn't make sense to flag them as unmanaged until they are started. The behavior in Prowler 2 was to only check running instances, here I'm passing pending, terminated, and stopped instances that can't be managed.
  • Fix metadata for sns_topics_kms_encryption_at_rest_enabled to use the correct ResourceType. SecurityHub is case sensitive.
  • Fix metadata for sns_topics_not_publicly_accessible to use the correct ResourceType. SecurityHub is case sensitive.
  • Mitigate pagination rate limit errors in large environments caused by ssm:DescribeInstanceInformation calls in ssm_service.py. Boto3 pagination does not successfully handle the rate limiting exceptions on this action, and a short sleep between pages is the best option I've found to make it work consistently in accounts with hundreds or thousands of instances. When these rate limit errors occur, any checks that depend on the ssm instance information such as ec2_instance_managed_by_ssm throw large numbers of false positives. The added delay does not seem to be significant and it addresses the false positives.

All code checks are passing for these changes.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mtronrd mtronrd requested review from a team as code owners June 14, 2024 00:56
@github-actions github-actions bot added the provider/aws Issues/PRs related with the AWS provider label Jun 14, 2024
Copy link

codecov bot commented Jun 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.62%. Comparing base (60b3523) to head (41d222a).
Report is 27 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4251      +/-   ##
==========================================
- Coverage   86.67%   86.62%   -0.06%     
==========================================
  Files         818      818              
  Lines       25700    25712      +12     
==========================================
- Hits        22275    22272       -3     
- Misses       3425     3440      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@jfagoagas jfagoagas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes in metadata @mtronrd !!

Regarding the improvement in the check, please add the missing tests and let us know if you need help with that.

About the sleep, we need to explicitly handle the possibles exceptions raised there instead of just adding a little delay.

Thanks for using Prowler 🚀

)
report.resource_id = instance.id
# instances not running should pass the check
if instance.state in ["pending", "terminated", "stopped"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some test to handle this new behaviour? One for each new state. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New tests added for "running", "stopped", and "terminated". Moto doesn't appear to support mocking the temporary statuses like "pending" in a way that the prowler services can read.

@@ -145,6 +146,7 @@ def __describe_instance_information__(self, regional_client):
id=resource_id,
region=regional_client.region,
)
time.sleep(0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about including this here, it doesn't seem as something deterministic. If there are rate limiting errors we should handle that explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boto3 paginator should be handling the throttling exceptions and retrying, but it doesn't work consistently for ssm:DescribeInstanceInformation and exhausts retries. We escalated this to AWS support and their recommendation was "Reduce the frequency of the API calls", which is why I've been using a short sleep between pages like this which has been working. The rate limits for this action are not documented and seem to behave differently from other AWS actions, and the problem is difficult to test for because it only shows up when you have large numbers of instances running, in our case ephemeral Spot processing jobs. When the problem occurs it generates large numbers of false positives because the ssm instance metadata is not captured and ec2_instance_managed_by_ssm fails for most of the instances, and we have not seen that happen after adding the sleep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this message as a comment above the time sleep? It'd be great to have this.

Thanks for your analysis 👏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

@jfagoagas jfagoagas self-assigned this Jun 17, 2024
@jfagoagas jfagoagas added the status/awaiting-reponse Waiting response from Issue owner label Jun 17, 2024
@jfagoagas jfagoagas removed the status/awaiting-reponse Waiting response from Issue owner label Jun 19, 2024
@jfagoagas jfagoagas self-requested a review June 19, 2024 06:38
Copy link
Member

@jfagoagas jfagoagas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @mtronrd 👏

I've made some changes in the tests to follow or style doing assert just for consistency.

@jfagoagas jfagoagas added the backport-v3 Pending to port to Prowler v3 branch label Jun 19, 2024
@jfagoagas jfagoagas merged commit 9147a45 into prowler-cloud:master Jun 19, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v3 Pending to port to Prowler v3 branch provider/aws Issues/PRs related with the AWS provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants