Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial infrastructure for data migrations #20134

Merged
merged 17 commits into from
Jun 28, 2024

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jun 25, 2024

This PR introduces cluster services for data migrations. The services are used for CRUD operations over the data migration abstraction and to provide an interfaces for the other subsystems to interact with the migrations.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

src/v/features/feature_table.cc Outdated Show resolved Hide resolved
src/v/features/feature_table.h Show resolved Hide resolved
src/v/cluster/data_migration_types.cc Show resolved Hide resolved
src/v/cluster/commands.h Outdated Show resolved Hide resolved
.migration_id = id,
.state = target_state,
});
it->second.state = target_state;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it any better than _topics[t] = {id, target_state}?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it any better than _topics[t] = {id, target_state}?

IIUC _topic[t] will first create a default value, and then copy the RHS over it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's not going to be more work. This default value creation is going to be just an allocation, without even zeroing out the members. We anyway create a temporary resource_metadata value before the call to the map, and we anyway create a map entry with the right key if it's not there yet. The only difference is how many bytes we copy into the map entry in case of insert/update, and I doubt it would make difference even in case of a hot path:

  • Current code, insert: copy the whole value, then copy target_state again.
  • Current code, update: just copy the target_state.
  • Suggested simple code, insert: copy the whole value.
  • Suggested simple code, update: copy the whole value.
    No clear winner for performance, but clear winner for readability.

src/v/cluster/data_migration_table.cc Outdated Show resolved Hide resolved
src/v/cluster/data_migration_table.cc Outdated Show resolved Hide resolved
src/v/cluster/data_migration_table.cc Outdated Show resolved Hide resolved
src/v/cluster/controller.cc Show resolved Hide resolved
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a fair amount of this PR can merge in smaller PRs

src/v/cluster/logger.cc Outdated Show resolved Hide resolved
}

std::ostream& operator<<(std::ostream& o, const cloud_storage_location&) {
fmt::print(o, "{{}}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cloud_storage_location is a placeholder empty type for now

* same identifiers.
*/
using data_migration_id = named_type<int64_t, struct data_migration_type_tag>;
using consumer_group = named_type<ss::sstring, struct consumer_group_tag>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just thinking that consumer_group seems like a pretty generic name to have at the cluster namespace level. i wonder if you need a data_migration namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we have a kafka::group_id. This is not really perfect to have it in cluster, i will move it into separate namespace

src/v/cluster/data_migration_types.h Outdated Show resolved Hide resolved
src/v/cluster/data_migration_types.h Outdated Show resolved Hide resolved
src/v/cluster/data_migrated_resources.cc Outdated Show resolved Hide resolved
src/v/cluster/data_migrated_resources.cc Outdated Show resolved Hide resolved
src/v/cluster/data_migration_table.cc Outdated Show resolved Hide resolved
Comment on lines 126 to 127
data_migration_id _next_id{0};
data_migration_id _last_applied{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the expected difference here for initialization values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_last_applied is intialized to min value of int64_t to indicate that nothing was applied, while we start assinging migration ids from 0

Added a feature that needs to be active for Redpanda to support data
migrations.

Signed-off-by: Michał Maślanka <[email protected]>
Introduced separate logger for data migrations to easily separate all
log entries related with migrating data across the clusters.

Signed-off-by: Michał Maślanka <[email protected]>
Introduced types representing inbound and outbound data migration types
together with the state and related metadata.

Signed-off-by: Michał Maślanka <[email protected]>
Introduced commands to manage data migrations. The commands represent
creation, update and deletion of migration.

Signed-off-by: Michał Maślanka <[email protected]>
@mmaslankaprv mmaslankaprv force-pushed the migrations-infra-rebased branch 2 times, most recently from 2bf643c to 7472a87 Compare June 27, 2024 10:07
@mmaslankaprv mmaslankaprv force-pushed the migrations-infra-rebased branch 2 times, most recently from 305942b to 7c6d105 Compare June 27, 2024 11:16
ss::future<> migrations_table::apply_snapshot(
model::offset, const controller_snapshot& snapshot) {
_next_id = snapshot.data_migrations.next_id;
_migrations.reserve(_migrations.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function also does not account for removing entries, I'll be fixing it in my PR so no action needed here.

src/v/cluster/fwd.h Outdated Show resolved Hide resolved
Introduced a class that is going to be instantiated on every shard and
will contain information about migrated resources. The class is
intended to be used by a validation logic in hot path where migration
information will be queried to block writes and properties updates.

Signed-off-by: Michał Maślanka <[email protected]>
.migration_id = id,
.state = migrated_resource_state::blocked,
});
vassert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of 2 situations where one of these asserts can fail:

  1. duplicate topics or groups in a single migration
  2. race condition: can migrated resources asynchronous calls be reordred?

}

private:
friend class migration_frontend;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change it to frontend in this commit?

@@ -84,6 +84,7 @@ class rm_stm;
namespace data_migrations {
class migrated_resources;
class migrations_table;
class migration_frontend;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: change it to frontend here rather than in a later commit

@@ -0,0 +1,343 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message for the frontend commit refers to wrong class name

Introduce a data migration table that is intended to store and track
data migration state. The table is going to be instantiated only on
shard 0 as it is not performance critical to access full migration data.
The table is driving migrated resources updates and validates the
migration state transitions.

Signed-off-by: Michał Maślanka <[email protected]>
Introduced RPCs allowing routing data migration related requests to
current controller leader.

Signed-off-by: Michał Maślanka <[email protected]>
Introduced `cluster::data_migrations::frontend`. Frontend class is an
entry point for the migration subsystem. It exposes API allowing caller
to interact with data migrations.

Signed-off-by: Michał Maślanka <[email protected]>
Added an RPC service handler for data migration subsystem.

Signed-off-by: Michał Maślanka <[email protected]>
Introduced placeholder for data migrations backend component.

Signed-off-by: Michał Maślanka <[email protected]>
Added admin server APIs in `/v1/migartions` path allowing external
clients to interact with migrations subsystem

Signed-off-by: Michał Maślanka <[email protected]>
When topic is being migrated we can not allow the topic properties and
partition updates. Added validation preventing creation of the topic
with the same name as the name on inbound migration topic, topic
property updates and topic deletion.

Signed-off-by: Michał Maślanka <[email protected]>
Copy link
Contributor

@bashtanov bashtanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but needs checks for duplicate topics and groups in a migration

@mmaslankaprv
Copy link
Member Author

ci failure: #19953

@mmaslankaprv mmaslankaprv merged commit 5f07574 into redpanda-data:dev Jun 28, 2024
16 of 19 checks passed
@mmaslankaprv mmaslankaprv deleted the migrations-infra-rebased branch June 28, 2024 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants