-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
skip polling providers still processing #5181
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5181 +/- ##
=====================================
Coverage 94.2% 94.2%
=====================================
Files 376 376
Lines 31233 31257 +24
Branches 3727 3735 +8
=====================================
+ Hits 29412 29435 +23
Misses 1161 1161
- Partials 660 661 +1 |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we have unleash short circuits inside of the Orchestrator:
def get_polling_batch(self):
if self.provider_uuid:
providers = Provider.objects.filter(uuid=self.provider_uuid)
else:
filters = {}
if self.provider_type:
filters["type"] = self.provider_type
providers = Provider.polling_objects.get_polling_batch(settings.POLLING_BATCH_SIZE, filters=filters)
LOG.info(f"providers: {len(providers)}")
batch = []
for provider in providers:
provider.polling_timestamp = self.dh.now
provider.save(update_fields=["polling_timestamp"])
schema_name = provider.account.get("schema_name")
if is_cloud_source_processing_disabled(schema_name):
LOG.info(log_json("get_polling_batch", msg="processing disabled for schema", schema=schema_name))
continue
if is_source_disabled(provider.uuid):
LOG.info(
log_json(
"get_polling_batch",
msg="processing disabled for source",
schema=schema_name,
provider_uuid=provider.uuid,
)
)
continue
batch.append(provider)
return batch
We may want to consider updating the polling timestamp after the is_cloud_source_processing_disabled
and is_source_disabled
checks. Otherwise they will look like they are "still processing" until we hit our "something went wrong" deadline; which may be confusing behavior.
Update:
Chatted with Luke about this and it could be problematic to move it down cause we would be collecting X disabled + w/e. Which sounds more problematic then running a download task.
fbcdd83
to
76ab892
Compare
efe1b0b
to
1da7a87
Compare
59d1d20
to
51e18dc
Compare
/retest |
a655b3d
to
7998787
Compare
bbf2a2b
to
933e1c6
Compare
Jira Ticket
COST-5180
Description
This change will add an additional filter to polling batch collection to not add provider to batch that has not completed yet (XL customers that take longer than 24 hours to process)
Testing
Release Notes
Notes:
I left a TODO in here around checking data_updated_timestamp being older and not getting updated. For example if for any reason our download processing fails, maybe we kill tasks or something. It's conceivable that the data_updated_timestamp wont be updated, meaning we never poll that provider again without manual intervention.
CORRECTION I added the following to handle this:
| Q(data_updated_timestamp__lte=process_wait_delta)
I have some interesting metrics on this if people are interested. We currently have 62 providers in that state 28 of those are currently processing though! The rest (34) we just keep queueing up but have not completed in years. Probably in bad states.