Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan for clustering and streaming #277

Open
tiedotguy opened this issue Nov 11, 2019 · 0 comments
Open

Plan for clustering and streaming #277

tiedotguy opened this issue Nov 11, 2019 · 0 comments
Assignees

Comments

@tiedotguy
Copy link
Collaborator

I've had several in-person conversations about the direction I'm heading with the HTTP refactoring work, however that's not useful for people outside of Atlassian who aren't aware of the approach being taken. This issue is to document the roadmap (not timeline), to explain my thoughts and the dependencies.

Step 1: unify HTTP clients. Up until recently, every component of the system made its own HTTP client. This resulted in a lot of copying and pasting and some amount of inconsistency. This has been done (#260)

Step 2: consolidate HTTP logic. This abstracts away lower level things such as retry logic, and turns "make HTTP request" in to "send a message". First PR is in draft as #272, however it requires another PR to actually consume it across the code base.

Step 3: add clustering. This is receiving a single message in the aggregation layer, and splitting it up in to messages destined for other aggregation hosts, or processing locally. This is sitting behind steps 1+2, because I don't want to add yet another HTTP client that needs an inevitable cleanup.

Step 4: add influxdb. This is fairly trivial, but again, I don't want another HTTP client adding to tech debt.

Step 5: add output streams. Once step 2 is completed, and components are sending a message to an http client/transport created by a central system, that message can be processed as an alternate form. While the exact plan isn't decided yet, this could be something like specifying a URL of kafka://somethingsomething, or it could be specifying a URL of https://somethingsomething, and having a roundtripper which submits it to Kafka.

This sidesteps the issue of coming up with an output format, because the output format is defined by the caller of the client. If the caller is the HTTP forwarder, then it's effectively pre-aggregated raw metrics. If the caller is the Datadog backend, then it's the Datadog JSON format, etc.

Step 6: add input streams for raw data. This is pretty vague, and needs some planning, but it's essentially taking the output from the forwarder, sending it to a stream, then having another instance read that stream and aggregate it. There may be some other steps before this, specifically around handling timestamps.

@tiedotguy tiedotguy self-assigned this Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant