Initial infrastructure for data migrations #20134

mmaslankaprv · 2024-06-25T13:33:41Z

This PR introduces cluster services for data migrations. The services are used for CRUD operations over the data migration abstraction and to provide an interfaces for the other subsystems to interact with the migrations.

Backports Required

Release Notes

none

src/v/features/feature_table.cc

src/v/features/feature_table.h

src/v/cluster/data_migration_types.cc

src/v/cluster/commands.h

bashtanov · 2024-06-25T14:26:29Z

src/v/cluster/data_migrated_resources.cc

+            .migration_id = id,
+            .state = target_state,
+          });
+        it->second.state = target_state;


Is it any better than _topics[t] = {id, target_state}?

Is it any better than _topics[t] = {id, target_state}?

IIUC _topic[t] will first create a default value, and then copy the RHS over it

Yes, but it's not going to be more work. This default value creation is going to be just an allocation, without even zeroing out the members. We anyway create a temporary resource_metadata value before the call to the map, and we anyway create a map entry with the right key if it's not there yet. The only difference is how many bytes we copy into the map entry in case of insert/update, and I doubt it would make difference even in case of a hot path:

Current code, insert: copy the whole value, then copy target_state again.

Current code, update: just copy the target_state.

Suggested simple code, insert: copy the whole value.

Suggested simple code, update: copy the whole value.
No clear winner for performance, but clear winner for readability.

src/v/cluster/data_migration_table.cc

src/v/cluster/data_migration_service_handler.cc

src/v/cluster/controller.cc

dotnwat

seems like a fair amount of this PR can merge in smaller PRs

src/v/cluster/logger.cc

dotnwat · 2024-06-26T23:19:47Z

src/v/cluster/data_migration_types.cc

+}
+
+std::ostream& operator<<(std::ostream& o, const cloud_storage_location&) {
+    fmt::print(o, "{{}}");


missing parameter?

the cloud_storage_location is a placeholder empty type for now

dotnwat · 2024-06-26T23:21:49Z

src/v/cluster/data_migration_types.h

+ * same identifiers.
+ */
+using data_migration_id = named_type<int64_t, struct data_migration_type_tag>;
+using consumer_group = named_type<ss::sstring, struct consumer_group_tag>;


just thinking that consumer_group seems like a pretty generic name to have at the cluster namespace level. i wonder if you need a data_migration namespace?

yeah, we have a kafka::group_id. This is not really perfect to have it in cluster, i will move it into separate namespace

src/v/cluster/data_migration_types.h

src/v/cluster/data_migrated_resources.cc

src/v/cluster/data_migration_table.cc

dotnwat · 2024-06-27T00:08:40Z

src/v/cluster/data_migration_table.h

+    data_migration_id _next_id{0};
+    data_migration_id _last_applied{};


what is the expected difference here for initialization values?

_last_applied is intialized to min value of int64_t to indicate that nothing was applied, while we start assinging migration ids from 0

src/v/cluster/data_migration_service_handler.cc

Signed-off-by: Michał Maślanka <[email protected]>

Added a feature that needs to be active for Redpanda to support data migrations. Signed-off-by: Michał Maślanka <[email protected]>

Introduced separate logger for data migrations to easily separate all log entries related with migrating data across the clusters. Signed-off-by: Michał Maślanka <[email protected]>

Introduced types representing inbound and outbound data migration types together with the state and related metadata. Signed-off-by: Michał Maślanka <[email protected]>

Introduced commands to manage data migrations. The commands represent creation, update and deletion of migration. Signed-off-by: Michał Maślanka <[email protected]>

src/v/cluster/data_migration_table.cc

src/v/cluster/CMakeLists.txt

bashtanov · 2024-06-27T12:11:02Z

src/v/cluster/data_migration_table.cc

+ss::future<> migrations_table::apply_snapshot(
+  model::offset, const controller_snapshot& snapshot) {
+    _next_id = snapshot.data_migrations.next_id;
+    _migrations.reserve(_migrations.size());


This function also does not account for removing entries, I'll be fixing it in my PR so no action needed here.

src/v/cluster/data_migration_table.h

src/v/cluster/fwd.h

src/v/cluster/data_migration_frontend.h

Introduced a class that is going to be instantiated on every shard and will contain information about migrated resources. The class is intended to be used by a validation logic in hot path where migration information will be queried to block writes and properties updates. Signed-off-by: Michał Maślanka <[email protected]>

Signed-off-by: Michał Maślanka <[email protected]>

bashtanov · 2024-06-28T08:05:12Z

src/v/cluster/data_migrated_resources.cc

+                .migration_id = id,
+                .state = migrated_resource_state::blocked,
+              });
+            vassert(


I can think of 2 situations where one of these asserts can fail:

duplicate topics or groups in a single migration

race condition: can migrated resources asynchronous calls be reordred?

bashtanov · 2024-06-28T08:06:30Z

src/v/cluster/data_migration_table.h

+    }
+
+private:
+    friend class migration_frontend;


nit: change it to frontend in this commit?

bashtanov · 2024-06-28T08:07:14Z

src/v/cluster/fwd.h

@@ -84,6 +84,7 @@ class rm_stm;
 namespace data_migrations {
 class migrated_resources;
 class migrations_table;
+class migration_frontend;


nit: change it to frontend here rather than in a later commit

bashtanov · 2024-06-28T08:08:15Z

src/v/cluster/data_migration_frontend.cc

@@ -0,0 +1,343 @@
+/*


commit message for the frontend commit refers to wrong class name

Introduce a data migration table that is intended to store and track data migration state. The table is going to be instantiated only on shard 0 as it is not performance critical to access full migration data. The table is driving migrated resources updates and validates the migration state transitions. Signed-off-by: Michał Maślanka <[email protected]>

Signed-off-by: Michał Maślanka <[email protected]>

Introduced RPCs allowing routing data migration related requests to current controller leader. Signed-off-by: Michał Maślanka <[email protected]>

Introduced `cluster::data_migrations::frontend`. Frontend class is an entry point for the migration subsystem. It exposes API allowing caller to interact with data migrations. Signed-off-by: Michał Maślanka <[email protected]>

Added an RPC service handler for data migration subsystem. Signed-off-by: Michał Maślanka <[email protected]>

Introduced placeholder for data migrations backend component. Signed-off-by: Michał Maślanka <[email protected]>

Signed-off-by: Michał Maślanka <[email protected]>

Added admin server APIs in `/v1/migartions` path allowing external clients to interact with migrations subsystem Signed-off-by: Michał Maślanka <[email protected]>

Signed-off-by: Michał Maślanka <[email protected]>

When topic is being migrated we can not allow the topic properties and partition updates. Added validation preventing creation of the topic with the same name as the name on inbound migration topic, topic property updates and topic deletion. Signed-off-by: Michał Maślanka <[email protected]>

bashtanov

LGTM, but needs checks for duplicate topics and groups in a migration

mmaslankaprv · 2024-06-28T12:40:54Z

ci failure: #19953

mmaslankaprv requested a review from a team as a code owner June 25, 2024 13:33

github-actions bot added the area/redpanda label Jun 25, 2024

mmaslankaprv requested review from bharathv, bashtanov and ztlpn June 25, 2024 14:20

bashtanov reviewed Jun 25, 2024

View reviewed changes

dotnwat reviewed Jun 27, 2024

View reviewed changes

m/record_batch_types: added data migration command batch type

de3d776

Signed-off-by: Michał Maślanka <[email protected]>

mmaslankaprv force-pushed the migrations-infra-rebased branch from 0020d7b to 388c77a Compare June 27, 2024 06:18

mmaslankaprv added 4 commits June 27, 2024 06:20

c/features: added feature gating data migration related commands

9d0660b

Added a feature that needs to be active for Redpanda to support data migrations. Signed-off-by: Michał Maślanka <[email protected]>

c/logger: introduced data migrations logger

e3957ea

Introduced separate logger for data migrations to easily separate all log entries related with migrating data across the clusters. Signed-off-by: Michał Maślanka <[email protected]>

c/dm_types: introduced basic types representing data migrations

8d5e649

Introduced types representing inbound and outbound data migration types together with the state and related metadata. Signed-off-by: Michał Maślanka <[email protected]>

c/commands: introduced data migration management command

c3a653e

Introduced commands to manage data migrations. The commands represent creation, update and deletion of migration. Signed-off-by: Michał Maślanka <[email protected]>

mmaslankaprv force-pushed the migrations-infra-rebased branch 2 times, most recently from 2bf643c to 7472a87 Compare June 27, 2024 10:07

mmaslankaprv requested review from dotnwat and bashtanov June 27, 2024 10:07

mmaslankaprv force-pushed the migrations-infra-rebased branch from 7472a87 to d29fff0 Compare June 27, 2024 10:11