-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial infrastructure for data migrations #20134
Initial infrastructure for data migrations #20134
Conversation
.migration_id = id, | ||
.state = target_state, | ||
}); | ||
it->second.state = target_state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it any better than _topics[t] = {id, target_state}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it any better than _topics[t] = {id, target_state}?
IIUC _topic[t]
will first create a default value, and then copy the RHS over it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's not going to be more work. This default value creation is going to be just an allocation, without even zeroing out the members. We anyway create a temporary resource_metadata
value before the call to the map, and we anyway create a map entry with the right key if it's not there yet. The only difference is how many bytes we copy into the map entry in case of insert/update, and I doubt it would make difference even in case of a hot path:
- Current code, insert: copy the whole value, then copy target_state again.
- Current code, update: just copy the target_state.
- Suggested simple code, insert: copy the whole value.
- Suggested simple code, update: copy the whole value.
No clear winner for performance, but clear winner for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a fair amount of this PR can merge in smaller PRs
} | ||
|
||
std::ostream& operator<<(std::ostream& o, const cloud_storage_location&) { | ||
fmt::print(o, "{{}}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cloud_storage_location
is a placeholder empty type for now
* same identifiers. | ||
*/ | ||
using data_migration_id = named_type<int64_t, struct data_migration_type_tag>; | ||
using consumer_group = named_type<ss::sstring, struct consumer_group_tag>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just thinking that consumer_group
seems like a pretty generic name to have at the cluster
namespace level. i wonder if you need a data_migration namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we have a kafka::group_id
. This is not really perfect to have it in cluster, i will move it into separate namespace
src/v/cluster/data_migration_table.h
Outdated
data_migration_id _next_id{0}; | ||
data_migration_id _last_applied{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the expected difference here for initialization values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_last_applied is intialized to min value of int64_t to indicate that nothing was applied, while we start assinging migration ids from 0
Signed-off-by: Michał Maślanka <[email protected]>
0020d7b
to
388c77a
Compare
Added a feature that needs to be active for Redpanda to support data migrations. Signed-off-by: Michał Maślanka <[email protected]>
Introduced separate logger for data migrations to easily separate all log entries related with migrating data across the clusters. Signed-off-by: Michał Maślanka <[email protected]>
Introduced types representing inbound and outbound data migration types together with the state and related metadata. Signed-off-by: Michał Maślanka <[email protected]>
Introduced commands to manage data migrations. The commands represent creation, update and deletion of migration. Signed-off-by: Michał Maślanka <[email protected]>
2bf643c
to
7472a87
Compare
7472a87
to
d29fff0
Compare
305942b
to
7c6d105
Compare
ss::future<> migrations_table::apply_snapshot( | ||
model::offset, const controller_snapshot& snapshot) { | ||
_next_id = snapshot.data_migrations.next_id; | ||
_migrations.reserve(_migrations.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function also does not account for removing entries, I'll be fixing it in my PR so no action needed here.
7c6d105
to
fbbfc33
Compare
Introduced a class that is going to be instantiated on every shard and will contain information about migrated resources. The class is intended to be used by a validation logic in hot path where migration information will be queried to block writes and properties updates. Signed-off-by: Michał Maślanka <[email protected]>
Signed-off-by: Michał Maślanka <[email protected]>
fbbfc33
to
6105133
Compare
.migration_id = id, | ||
.state = migrated_resource_state::blocked, | ||
}); | ||
vassert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of 2 situations where one of these asserts can fail:
- duplicate topics or groups in a single migration
- race condition: can migrated resources asynchronous calls be reordred?
src/v/cluster/data_migration_table.h
Outdated
} | ||
|
||
private: | ||
friend class migration_frontend; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: change it to frontend
in this commit?
src/v/cluster/fwd.h
Outdated
@@ -84,6 +84,7 @@ class rm_stm; | |||
namespace data_migrations { | |||
class migrated_resources; | |||
class migrations_table; | |||
class migration_frontend; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: change it to frontend
here rather than in a later commit
@@ -0,0 +1,343 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit message for the frontend commit refers to wrong class name
Introduce a data migration table that is intended to store and track data migration state. The table is going to be instantiated only on shard 0 as it is not performance critical to access full migration data. The table is driving migrated resources updates and validates the migration state transitions. Signed-off-by: Michał Maślanka <[email protected]>
Signed-off-by: Michał Maślanka <[email protected]>
Introduced RPCs allowing routing data migration related requests to current controller leader. Signed-off-by: Michał Maślanka <[email protected]>
6105133
to
bab781d
Compare
Introduced `cluster::data_migrations::frontend`. Frontend class is an entry point for the migration subsystem. It exposes API allowing caller to interact with data migrations. Signed-off-by: Michał Maślanka <[email protected]>
Added an RPC service handler for data migration subsystem. Signed-off-by: Michał Maślanka <[email protected]>
Introduced placeholder for data migrations backend component. Signed-off-by: Michał Maślanka <[email protected]>
Signed-off-by: Michał Maślanka <[email protected]>
Added admin server APIs in `/v1/migartions` path allowing external clients to interact with migrations subsystem Signed-off-by: Michał Maślanka <[email protected]>
Signed-off-by: Michał Maślanka <[email protected]>
When topic is being migrated we can not allow the topic properties and partition updates. Added validation preventing creation of the topic with the same name as the name on inbound migration topic, topic property updates and topic deletion. Signed-off-by: Michał Maślanka <[email protected]>
bab781d
to
b41b000
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but needs checks for duplicate topics and groups in a migration
ci failure: #19953 |
This PR introduces cluster services for data migrations. The services are used for CRUD operations over the data migration abstraction and to provide an interfaces for the other subsystems to interact with the migrations.
Backports Required
Release Notes