CORE-4600 - Quotas: disable produce quota by default #20142

pgellert · 2024-06-25T18:53:37Z

During performance testing it became apparent that the overhead of quota management, while small, is non-negligible (~400ns/request). It is believed that some of our latency sensitive customers would prefer to avoid having this overhead on the request path. Therefore, this PR adds the ability to disable the produce quota config and sets it to disabled by default.

This is implemented without changing the configuration type (from bounded_property<uint32_t> to property<std::optional<uint32_t>>), instead I opted to use 0 as a sentinel value for disabled. Locally, I tested the change to property<std::optional<uint32_t>> as well, and it seems to work since the values are serialized to ss::sstring into the controller log and the serialized values of uint32 seem to be a subset of those of optional<uint32>. However, it still seems lower risk to just use a sentinel value here because it's hard to pin down and test all the places where the config values are serialized into a yaml/json and how those serializations will change if we change the config value into a std::optional<uint32_t>. Since these configs are deprecated and going to be removed in 2 major versions, I believe this is a reasonable trade off.

Fixes https://redpandadata.atlassian.net/browse/CORE-4600

Benchmark results

As expected, all the benchmarks that have the quotas off improve - which now also includes the default case for produce.

single run iterations:    0
single run duration:      1.000s
number of runs:           5
number of cores:          32
random seed:              2540878669

test                                                            iterations      median         mad         min         max      allocs       tasks        inst
throughput_group.test_quota_manager_on_unlimited_shared           41943040    28.397ns     0.012ns    28.368ns    28.410ns       0.656       0.000         0.0
throughput_group.test_quota_manager_on_unlimited_unique           41943040    27.000ns     0.087ns    26.840ns    28.767ns       0.252       0.001         0.0
throughput_group.test_quota_manager_on_limited_shared             41943040    31.136ns     0.155ns    30.972ns    31.410ns       0.656       0.000         0.0
throughput_group.test_quota_manager_on_limited_unique             41943040    27.482ns     0.126ns    27.294ns    27.608ns       0.252       0.001         0.0
throughput_group.test_quota_manager_off_shared                   608174080     1.597ns     0.009ns     1.588ns     1.727ns       0.094       0.000         0.0
throughput_group.test_quota_manager_off_unique                   618659840     1.524ns     0.009ns     1.515ns     1.721ns       0.063       0.000         0.0
latency_group.existing_client_produce_100_others                    110100   305.827ns     2.455ns   300.770ns   370.914ns      11.000       0.000         0.0
latency_group.existing_client_fetch_100_others                      105000   607.491ns     1.238ns   600.661ns   724.629ns      22.000       0.000         0.0
latency_group.new_client_produce_100_others                         102800   739.801ns     5.008ns   734.794ns   836.278ns      12.040       0.000         0.0
latency_group.new_client_fetch_100_others                           101200     1.016us     0.782ns     1.011us     1.122us      17.040       0.002         0.0
latency_group.existing_client_produce_1000_others                    58700   169.915ns     0.214ns   169.287ns   170.129ns      11.000       0.000         0.0
latency_group.existing_client_fetch_1000_others                      60700   345.662ns     2.627ns   342.027ns   395.834ns      22.000       0.000         0.0
latency_group.new_client_produce_1000_others                         65300   531.249ns     1.291ns   529.958ns   608.963ns      12.020       0.001         0.0
latency_group.new_client_fetch_1000_others                           60100   706.559ns     0.445ns   706.114ns   780.256ns      17.020       0.001         0.0
latency_group.existing_client_produce_10000_others                   13100   172.368ns     0.335ns   172.033ns   173.614ns      11.000       0.000         0.0
latency_group.existing_client_fetch_10000_others                     14400   352.623ns     1.389ns   350.707ns   354.012ns      22.000       0.000         0.0
latency_group.new_client_produce_10000_others                        14300   507.215ns     1.931ns   505.284ns   516.292ns      12.000       0.001         0.0
latency_group.new_client_fetch_10000_others                          14400   696.061ns     4.296ns   691.765ns   705.216ns      17.000       0.001         0.0
latency_group.existing_client_produce_100_others_not_shard_0         71700   376.862ns     3.136ns   373.006ns   379.997ns      11.000       0.002         0.0
latency_group.existing_client_fetch_100_others_not_shard_0           82600   701.857ns     7.485ns   694.373ns   818.803ns      22.000       0.005         0.0
latency_group.new_client_produce_100_others_not_shard_0              70600     2.779us     3.321ns     2.776us     3.208us       6.020       3.000         0.0
latency_group.new_client_fetch_100_others_not_shard_0                68500     3.066us     3.891ns     3.062us     3.546us      11.020       3.001         0.0
latency_group.existing_client_produce_1000_others_not_shard_0        25300   341.737ns     0.975ns   340.613ns   343.275ns      11.000       0.001         0.0
latency_group.existing_client_fetch_1000_others_not_shard_0          25000   688.687ns     5.226ns   683.461ns   718.709ns      22.000       0.002         0.0
latency_group.new_client_produce_1000_others_not_shard_0             23300     3.067us    20.573ns     3.012us     3.087us       6.010       3.000         0.0
latency_group.new_client_fetch_1000_others_not_shard_0               23600     3.252us     2.168ns     3.247us     3.255us      11.010       3.001         0.0
latency_group.existing_client_produce_10000_others_not_shard_0        3400   339.762ns     1.056ns   337.747ns   342.591ns      11.000       0.001         0.0
latency_group.existing_client_fetch_10000_others_not_shard_0          3400   667.726ns     1.000ns   666.725ns   672.398ns      22.000       0.004         0.0
latency_group.new_client_produce_10000_others_not_shard_0             3300     2.815us     5.786ns     2.797us     2.829us       6.000       3.000         0.0
latency_group.new_client_fetch_10000_others_not_shard_0               3300     3.108us    19.237ns     3.076us     4.192us      11.000       3.001         0.0
latency_group.default_configs_produce_worst                         127200    26.415ns     0.197ns    26.003ns    33.435ns       1.000       0.000         0.0
latency_group.default_configs_fetch_worst                           126000    47.980ns     0.153ns    47.827ns    54.350ns       2.000       0.000         0.0

Backports Required

Release Notes

Improvements

The produce client quota (target_quota_byte_rate) is now disabled by default. Previously this was enabled at 2GB/shard/client.id.

pgellert · 2024-06-25T18:54:11Z

/dt

BenPope

Does what it says on the tin.

Can you add the benchmark?

src/v/config/configuration.cc

pgellert · 2024-06-27T11:36:51Z

Can you add the benchmark?

I added it soon after raising the PR. I think if you refresh you should see it.

BenPope · 2024-06-27T11:39:39Z

Can you add the benchmark?

I added it soon after raising the PR. I think if you refresh you should see it.

I only see changes to src/v/kafka/server/tests/client_quota_translator_test.cc

pgellert · 2024-06-27T11:41:48Z

If you mean that I should add new benchmarks, I haven't added any because the existing ones cover the change. Specifically these lines from the PR cover letter:

throughput_group.test_quota_manager_off_shared                   608174080     1.597ns     0.009ns     1.588ns     1.727ns       0.094       0.000         0.0
throughput_group.test_quota_manager_off_unique                   618659840     1.524ns     0.009ns     1.515ns     1.721ns       0.063       0.000         0.0
latency_group.default_configs_produce_worst                         127200    26.415ns     0.197ns    26.003ns    33.435ns       1.000       0.000         0.0
latency_group.default_configs_fetch_worst                           126000    47.980ns     0.153ns    47.827ns    54.350ns       2.000       0.000         0.0

Deflaimun · 2024-06-27T14:29:16Z

src/v/config/configuration.cc

@@ -496,12 +496,13 @@ configuration::configuration()
  , target_quota_byte_rate(
      *this,
      "target_quota_byte_rate",
-      "Target request size quota byte rate (bytes per second) - 2GB default",
+      "Target request size quota byte rate (bytes per second) - disabled "


What does "disabled" mean in this description? It's disabled by default? Or 0 = disabled?
Also, no need to explicitly mention the default number in the description, even if it's referenced in another object

It's disabled by default? Or 0 = disabled?

Both yes.

Also, no need to explicitly mention the default number in the description, even if it's referenced in another object

I am happy to remove it, but I'm wondering what's the best way to describe that 0 is a sentinel value for disabled since unfortunately target_quota_byte_rate is an integer and not an optional integer. I'm wondering if rewriting the end to - 0 means disabled would make the most sense. What do you think @Deflaimun?

I would simply remove the default value from the description. In docs website we can add this extra info.

If you really want to have it in the description maybe something like
"Target request size quota byte rate (bytes per second) - (default: "0" - disabled)

Thank you! Sounds good, I've removed the ending now.

Deflaimun · 2024-06-27T14:36:26Z

src/v/config/configuration.cc

@@ -496,12 +496,13 @@ configuration::configuration()
  , target_quota_byte_rate(
      *this,
      "target_quota_byte_rate",
-      "Target request size quota byte rate (bytes per second) - 2GB default",
+      "Target request size quota byte rate (bytes per second) - disabled "


Suggested change

"Target request size quota byte rate (bytes per second) - disabled "

"Target request size quota byte rate (bytes per second)",

Deflaimun · 2024-06-27T14:36:55Z

src/v/config/configuration.cc

@@ -496,12 +496,13 @@ configuration::configuration()
  , target_quota_byte_rate(
      *this,
      "target_quota_byte_rate",
-      "Target request size quota byte rate (bytes per second) - 2GB default",
+      "Target request size quota byte rate (bytes per second) - disabled "
+      "default (= 0)",


Suggested change

"default (= 0)",

* Reinterpret the 0 value of `target_quota_byte_rate` as disable (this is safe as previously the minimum value was 1MB, so noone has this set to 0) * Change the default value to be disabled

Minimize the impact of client quota management in the default (and expectly most common case) of having no quotas configured.

pgellert · 2024-06-27T15:22:49Z

Force-pushed to improve the config description as per feedback.

github-actions bot added the area/redpanda label Jun 25, 2024

pgellert force-pushed the quotas/disable-produce-default branch 2 times, most recently from a22894d to cd79309 Compare June 27, 2024 08:23

pgellert self-assigned this Jun 27, 2024

pgellert changed the title ~~[WIP] Quotas: disable produce quota by default~~ CORE-4600 - Quotas: disable produce quota by default Jun 27, 2024

pgellert requested review from travisdowns, StephanDollberg, a team, michael-redpanda and BenPope and removed request for a team June 27, 2024 09:14

pgellert marked this pull request as ready for review June 27, 2024 09:14

pgellert requested a review from a team as a code owner June 27, 2024 09:14

BenPope reviewed Jun 27, 2024

View reviewed changes

src/v/config/configuration.cc Show resolved Hide resolved

michael-redpanda previously approved these changes Jun 27, 2024

View reviewed changes

BenPope previously approved these changes Jun 27, 2024

View reviewed changes

Deflaimun reviewed Jun 27, 2024

View reviewed changes

pgellert added 2 commits June 27, 2024 16:22

k/quotas: disable produce quota by default

a3489f2

* Reinterpret the 0 value of `target_quota_byte_rate` as disable (this is safe as previously the minimum value was 1MB, so noone has this set to 0) * Change the default value to be disabled

k/quotas: shortcut quota methods if no quotas

19e138d

Minimize the impact of client quota management in the default (and expectly most common case) of having no quotas configured.

pgellert dismissed stale reviews from BenPope and michael-redpanda via 19e138d June 27, 2024 15:22

pgellert force-pushed the quotas/disable-produce-default branch from cd79309 to 19e138d Compare June 27, 2024 15:22

pgellert requested a review from Deflaimun June 27, 2024 15:24

pgellert requested review from BenPope and michael-redpanda June 27, 2024 15:24

BenPope approved these changes Jun 27, 2024

View reviewed changes

pgellert merged commit dd3a163 into redpanda-data:dev Jun 27, 2024
18 checks passed

pgellert deleted the quotas/disable-produce-default branch June 27, 2024 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CORE-4600 - Quotas: disable produce quota by default #20142

CORE-4600 - Quotas: disable produce quota by default #20142

pgellert commented Jun 25, 2024 •

edited

Loading

pgellert commented Jun 25, 2024

BenPope left a comment

pgellert commented Jun 27, 2024

BenPope commented Jun 27, 2024

pgellert commented Jun 27, 2024

Deflaimun Jun 27, 2024 •

edited

Loading

pgellert Jun 27, 2024

Deflaimun Jun 27, 2024

pgellert Jun 27, 2024

Deflaimun Jun 27, 2024 •

edited

Loading

pgellert Jun 27, 2024

Deflaimun Jun 27, 2024

pgellert Jun 27, 2024

pgellert commented Jun 27, 2024

	"Target request size quota byte rate (bytes per second) - disabled "
	"Target request size quota byte rate (bytes per second)",

CORE-4600 - Quotas: disable produce quota by default #20142

CORE-4600 - Quotas: disable produce quota by default #20142

Conversation

pgellert commented Jun 25, 2024 • edited Loading

Benchmark results

Backports Required

Release Notes

Improvements

pgellert commented Jun 25, 2024

BenPope left a comment

Choose a reason for hiding this comment

pgellert commented Jun 27, 2024

BenPope commented Jun 27, 2024

pgellert commented Jun 27, 2024

Deflaimun Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

pgellert Jun 27, 2024

Choose a reason for hiding this comment

Deflaimun Jun 27, 2024

Choose a reason for hiding this comment

pgellert Jun 27, 2024

Choose a reason for hiding this comment

Deflaimun Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

pgellert Jun 27, 2024

Choose a reason for hiding this comment

Deflaimun Jun 27, 2024

Choose a reason for hiding this comment

pgellert Jun 27, 2024

Choose a reason for hiding this comment

pgellert commented Jun 27, 2024

pgellert commented Jun 25, 2024 •

edited

Loading

Deflaimun Jun 27, 2024 •

edited

Loading

Deflaimun Jun 27, 2024 •

edited

Loading