Per-collection metrics for Prometheus #4455

xhjkl · 2024-06-12T13:47:34Z

/claim #3322

All Submissions:

Contributions should target the dev branch. Did you create your branch from dev?
Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your core changes, as applicable?
Have you successfully ran tests with your changes locally?

Per-Collection metrics

Please consider my attempt as well. With the current iteration, it looks like this:

New Metrics Output

$ http ':6333/telemetry'

...

"requests": {
  "rest": {
      "responses": {
          "GET /collections": {
              "": {
                  "200": {
                      "avg_duration_micros": 275.0,
                      "count": 1,
                      "last_responded": "2024-06-12T13:14:34.050Z",
                      "max_duration_micros": 275.0,
                      "min_duration_micros": 275.0,
                      "total_duration_micros": 275
                  }
              }
          },
          "GET /metrics": {
              "": {
                  "200": {
                      "avg_duration_micros": 613.2644,
                      "count": 28041,
                      "last_responded": "2024-06-12T13:16:33.610Z",
                      "max_duration_micros": 5306.0,
                      "min_duration_micros": 556.0,
                      "total_duration_micros": 17477445
                  }
              }
          },
          "POST /collections/{name}/points/search": {
              "benchmark": {
                  "200": {
                      "avg_duration_micros": 715.75,
                      "count": 4,
                      "last_responded": "2024-06-12T13:14:58.190Z",
                      "max_duration_micros": 830.0,
                      "min_duration_micros": 615.0,
                      "total_duration_micros": 2863
                  }
              }
          }
      }
  }
}
...

$ http ':6333/metrics'
...
rest_responses_duration_seconds_bucket{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200",le="0.0005"} 0
rest_responses_duration_seconds_bucket{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200",le="0.001"} 4
rest_responses_duration_seconds_bucket{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200",le="+Inf"} 4
rest_responses_duration_seconds_sum{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200"} 0.002863
rest_responses_duration_seconds_count{method="POST",endpoint="/collections/{name}/points/search",collection="benchmark",status="200"} 4    
...
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Search",le="0.005"} 0
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Search",le="0.01"} 1
grpc_responses_duration_seconds_bucket{endpoint="/qdrant.Points/Search",le="+Inf"} 1
grpc_responses_duration_seconds_sum{endpoint="/qdrant.Points/Search"} 0.006761
grpc_responses_duration_seconds_count{endpoint="/qdrant.Points/Search"} 1

For now, the per-collection metrics are implemnted for webapi-based routes,
but for gRPC they are not.

It's difficult to do the same for gRPC because there's no straightforward way
to extract the collection name from the request body:

if we depend on a newer http and hyper crates than tonic, then the exported traits are different, and thus I was unable to come up with a solution to parse the request body from within the middleware layer in a finite amount of time,
even if we lock the versions of http and hyper to the versions tonic is using, we still cannot parse the body without consuming the request -- surely, we can consume the request, clone its parts, and pass the cloned one on to the handlers, but I feel bit uneasy about doing this for a purely cosmetic feature.

Perhaps it's worth talking to Tonic if it's possible to introduce an api to peek into the request body inside a middleware layer.

To make sure there is no performance regression, I benchmarked it as follows:

Before:

$ wrk -c1 -t1 -d20s 'http://localhost:6333/metrics'

Running 20s test @ http://localhost:6333/metrics
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   718.24us   74.99us   2.49ms   95.81%
    Req/Sec     1.40k    53.66     1.46k    90.55%
  27948 requests in 20.10s, 107.89MB read
Requests/sec:   1390.46
Transfer/sec:      5.37MB

After:

$ wrk -c1 -t1 -d20s 'http://localhost:6333/metrics'

Running 20s test @ http://localhost:6333/metrics
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   716.82us  118.33us   5.76ms   94.59%
    Req/Sec     1.40k    99.83     1.50k    87.56%
  28040 requests in 20.10s, 114.40MB read
Requests/sec:   1394.96
Transfer/sec:      5.69MB

Before going further, I wanted to check with the code owners whether we are content with the schema change mentioned above?

If we don't want to change the telemetry schema presented to the user, can untie stored *TelemetryData from presented, and split the structure in two?

xhjkl · 2024-06-12T13:48:04Z

@timvisee, @generall, would love you to chime in 🙏✨

algora-pbc · 2024-06-12T13:49:34Z

💵 To receive payouts, sign up on Algora, link your Github account and connect with Stripe/Alipay.

xhjkl added 3 commits June 12, 2024 16:46

newline

a1066f6

AddAssign for OperationDurationStatistics

3f05ee3

per-collection metrics for web api

53e7889

algora-pbc bot mentioned this pull request Jun 12, 2024

Per-collection metrics for Prometheus #3322

Open

algora-pbc bot added the 🙋 Bounty claim label Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-collection metrics for Prometheus #4455

Per-collection metrics for Prometheus #4455

xhjkl commented Jun 12, 2024 •

edited

Loading

xhjkl commented Jun 12, 2024

algora-pbc bot commented Jun 12, 2024

Per-collection metrics for Prometheus #4455

Are you sure you want to change the base?

Per-collection metrics for Prometheus #4455

Conversation

xhjkl commented Jun 12, 2024 • edited Loading

All Submissions:

New Feature Submissions:

Changes to Core Features:

Per-Collection metrics

xhjkl commented Jun 12, 2024

algora-pbc bot commented Jun 12, 2024

xhjkl commented Jun 12, 2024 •

edited

Loading