Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-collection metrics for Prometheus #4455

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from
Draft

Per-collection metrics for Prometheus #4455

wants to merge 3 commits into from

Conversation

xhjkl
Copy link
Contributor

@xhjkl xhjkl commented Jun 12, 2024

/claim #3322

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Per-Collection metrics

Please consider my attempt as well. With the current iteration, it looks like this:

New Metrics Output
$ http ':6333/telemetry'

...

"requests": {
  "rest": {
      "responses": {
          "GET /collections": {
              "": {
                  "200": {
                      "avg_duration_micros": 275.0,
                      "count": 1,
                      "last_responded": "2024-06-12T13:14:34.050Z",
                      "max_duration_micros": 275.0,
                      "min_duration_micros": 275.0,
                      "total_duration_micros": 275
                  }
              }
          },
          "GET /metrics": {
              "": {
                  "200": {
                      "avg_duration_micros": 613.2644,
                      "count": 28041,
                      "last_responded": "2024-06-12T13:16:33.610Z",
                      "max_duration_micros": 5306.0,
                      "min_duration_micros": 556.0,
                      "total_duration_micros": 17477445
                  }
              }
          },
          "POST /collections/{name}/points/search": {
              "benchmark": {
                  "200": {
                      "avg_duration_micros": 715.75,
                      "count": 4,
                      "last_responded": "2024-06-12T13:14:58.190Z",
                      "max_duration_micros": 830.0,
                      "min_duration_micros": 615.0,
                      "total_duration_micros": 2863
                  }
              }
          }
      }
  }
}
...

$ http ':6333/metrics'
...
rest_responses_duration_seconds_bucket{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200",le="0.0005"} 0
rest_responses_duration_seconds_bucket{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200",le="0.001"} 4
rest_responses_duration_seconds_bucket{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200",le="+Inf"} 4
rest_responses_duration_seconds_sum{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200"} 0.002863
rest_responses_duration_seconds_count{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200"} 4    
...
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Search",le="0.005"} 0
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Search",le="0.01"} 1
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Search",le="+Inf"} 1
grpc_responses_duration_seconds_sum{endpoint="/qdrant.Points/Search"} 0.006761
grpc_responses_duration_seconds_count{endpoint="/qdrant.Points/Search"} 1

For now, the per-collection metrics are implemnted for webapi-based routes,
but for gRPC they are not.

It's difficult to do the same for gRPC because there's no straightforward way
to extract the collection name from the request body:

  • if we depend on a newer http and hyper crates than tonic, then the exported traits are different, and thus I was unable to come up with a solution to parse the request body from within the middleware layer in a finite amount of time,
  • even if we lock the versions of http and hyper to the versions tonic is using, we still cannot parse the body without consuming the request -- surely, we can consume the request, clone its parts, and pass the cloned one on to the handlers, but I feel bit uneasy about doing this for a purely cosmetic feature.

Perhaps it's worth talking to Tonic if it's possible to introduce an api to peek into the request body inside a middleware layer.

To make sure there is no performance regression, I benchmarked it as follows:

Before:
$ wrk -c1 -t1 -d20s 'http://localhost:6333/metrics'

Running 20s test @ http://localhost:6333/metrics
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   718.24us   74.99us   2.49ms   95.81%
    Req/Sec     1.40k    53.66     1.46k    90.55%
  27948 requests in 20.10s, 107.89MB read
Requests/sec:   1390.46
Transfer/sec:      5.37MB
After:
$ wrk -c1 -t1 -d20s 'http://localhost:6333/metrics'

Running 20s test @ http://localhost:6333/metrics
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   716.82us  118.33us   5.76ms   94.59%
    Req/Sec     1.40k    99.83     1.50k    87.56%
  28040 requests in 20.10s, 114.40MB read
Requests/sec:   1394.96
Transfer/sec:      5.69MB

Before going further, I wanted to check with the code owners whether we are content with the schema change mentioned above?

If we don't want to change the telemetry schema presented to the user, can untie stored *TelemetryData from presented, and split the structure in two?

@xhjkl
Copy link
Contributor Author

xhjkl commented Jun 12, 2024

@timvisee, @generall, would love you to chime in 🙏✨

Copy link

algora-pbc bot commented Jun 12, 2024

💵 To receive payouts, sign up on Algora, link your Github account and connect with Stripe/Alipay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant