Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Write Operations when Slot is in MIGRATING state #474

Merged
merged 43 commits into from
Jun 27, 2024

Conversation

vazois
Copy link
Contributor

@vazois vazois commented Jun 18, 2024

This PR resolves issue #354.

Tasks:

  • Consolidate slot verification methods.
  • Introduce BDN benchmark for cluster.
  • Add unit tests to validate cluster redirect functionality.
  • Implement key level access control.
  • Benchmark standalone cluster and main
     

Overview of Migration

The process of slot migration involves several stages which are designed to manage data access, ensuring both data integrity and high availability as the corresponding keys are being transferred from the source node to the target node.
The MIGRATE command supports two transfer options, specifically MIGRATE KEYS and MIGRATE SLOTS.
Both options use a common data access control interface, that is logically divided into the following categories:

  1. Slot level access control
    Used to orchestrate migration by changing the state of the associated slot accordingly (i.e. MIGRATING, IMPORTING) at the corresponding source and target nodes
  2. Key level access control
    Used to control access to individual keys in order to ensure high availability and data integrity.

Key level access control

Each migrate session maintains a dictionary of <keys, KeyMigrationStatus> pairs.
This dictionary is used to control access to keys that are actively being managed by a single running migrate session.
When a slot is in the process of migration and there are no active migration sessions managing a specific key, any requests for that key are handled under the assumption that the key exists. If it doesn’t, a redirect -ASK is created, which points to the endpoint of the target node.

The KeyMigrationStatus will affect individual session readers and writers as follows:

  • QUEUED (QD)
    Key owned by a specific MigrateSession but is not actively being migrated.
    Reads and writes can be served if the keys exist.
  • MIGRATING (MG)
    Key is actively being migrated by a specific MigrateSession.
    Writes will be delayed until status goes back to QUEUED or MIGRATED.
    Reads can be served without any restriction.
  • DELETING (DL)
    Key is being deleted after it was sent to the target node.
    Reads and writes will be delayed.
    We need to delay reads to avoid the scenario where a key exists during validation but was deleted before read executes.
  • MIGRATED (MD)
    Key owned by a specific MigrateSession and has completed all the steps to be MIGRATED to target node.
    This does not mean that key existed or has not expired, just that all the steps associated with MIGRATED have completed.
    This can happen for a key that was provided as an argument in MIGRATE command but did not exist or expired for both main and object stores.

Key Transfer State Machine Algorithm

  1. Add keys to dictionary and initialize KeyMigrationStatus to QUEUED
  2. Perform for main and object store separately
    1. Transition keys from QUEUED to MIGRATING state
    2. Await for status change propagation using epoch protection.
    3. For every key in MIGRATING state perform the following:
      1. Lookup for key at given store.
      2. If key is found send it (do it in batch) to the target node.
      3. If key is not found change state back to QUEUED to unblock any writers.
    4. If copy option is disabled perform the following:
      1. Transition keys from MIGRATING to DELETING.
      2. Await for status change propagation.
      3. For every key in DELETING state, delete it and change its state to MIGRATED.
    5. If copy option is enabled transition keys from MIGRATING to MIGRATED state.
graph TD;
    QD-->MG;
    MG-->DL;
    MG-->QD;
    MG-->MD;
    DL-->MD;
Loading

RespClusterBench - Slot in STABLE state

Main

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
Get .NET 6 Empty .NET 6.0 28.82 us 0.078 us 0.069 us -
Set .NET 6 Empty .NET 6.0 34.55 us 0.053 us 0.047 us -
MGet .NET 6 Empty .NET 6.0 25.68 us 0.018 us 0.017 us -
MSet .NET 6 Empty .NET 6.0 26.02 us 0.033 us 0.031 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 21.71 us 0.043 us 0.038 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 20.69 us 0.037 us 0.035 us -
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 19.08 us 0.049 us 0.046 us -
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 19.02 us 0.011 us 0.010 us -

PR #474

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
Get .NET 6 Empty .NET 6.0 28.14 us 0.019 us 0.017 us -
Set .NET 6 Empty .NET 6.0 29.66 us 0.020 us 0.017 us -
MGet .NET 6 Empty .NET 6.0 24.48 us 0.049 us 0.045 us -
MSet .NET 6 Empty .NET 6.0 28.09 us 0.244 us 0.228 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 21.16 us 0.011 us 0.009 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 20.34 us 0.012 us 0.011 us -
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 18.24 us 0.045 us 0.042 us -
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 17.25 us 0.044 us 0.041 us -

Diff (%)

Method Job EnvironmentVariables Runtime Mean
Get .NET 6 Empty .NET 6.0 2.36 %
Set .NET 6 Empty .NET 6.0 14.15 %
MGet .NET 6 Empty .NET 6.0 4.67 %
MSet .NET 6 Empty .NET 6.0 -7.96 %
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 2.53 %
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 1.69 %
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 4.4 %
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 9.31 %

RespClusterMigrateBench - Slot in MIGRATING state

Main

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
Get .NET 6 Empty .NET 6.0 49.16 us 0.222 us 0.208 us -
Set .NET 6 Empty .NET 6.0 60.81 us 0.688 us 0.643 us -
MGet .NET 6 Empty .NET 6.0 47.23 us 0.175 us 0.164 us -
MSet .NET 6 Empty .NET 6.0 49.34 us 0.197 us 0.184 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 39.66 us 0.150 us 0.140 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 39.40 us 0.020 us 0.019 us -
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 36.54 us 0.017 us 0.015 us -
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 38.56 us 0.086 us 0.080 us -

PR #474

Method Job EnvironmentVariables Runtime Mean Error StdDev Allocated
Get .NET 6 Empty .NET 6.0 52.76 us 0.162 us 0.151 us -
Set .NET 6 Empty .NET 6.0 64.02 us 0.277 us 0.259 us -
MGet .NET 6 Empty .NET 6.0 47.87 us 0.151 us 0.141 us -
MSet .NET 6 Empty .NET 6.0 49.47 us 0.187 us 0.175 us -
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 40.23 us 0.025 us 0.020 us -
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 41.76 us 0.028 us 0.026 us -
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 37.38 us 0.109 us 0.097 us -
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 36.85 us 0.037 us 0.033 us -

Diff (%)

Method Job EnvironmentVariables Runtime Mean
Get .NET 6 Empty .NET 6.0 -7.32 %
Set .NET 6 Empty .NET 6.0 -5.28 %
MGet .NET 6 Empty .NET 6.0 -1.36 %
MSet .NET 6 Empty .NET 6.0 -0.26 %
Get .NET 8 DOTNET_TieredPGO=0 .NET 8.0 -1.44 %
Set .NET 8 DOTNET_TieredPGO=0 .NET 8.0 -5.99 %
MGet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 -2.3 %
MSet .NET 8 DOTNET_TieredPGO=0 .NET 8.0 4.43 %

@vazois vazois linked an issue Jun 18, 2024 that may be closed by this pull request
@vazois vazois force-pushed the vazois/allow-writes-during-migration branch from be71b9d to 0bad8ce Compare June 19, 2024 01:09
@vazois vazois marked this pull request as draft June 19, 2024 15:24
@vazois vazois changed the title Allow Write Operations when Slot in MIGRATING state Allow Write Operations when Slot is in MIGRATING state Jun 19, 2024
@vazois vazois force-pushed the vazois/allow-writes-during-migration branch 10 times, most recently from 6975349 to bd8b53a Compare June 26, 2024 17:25
@vazois vazois marked this pull request as ready for review June 26, 2024 18:27
@vazois vazois force-pushed the vazois/allow-writes-during-migration branch from 3531784 to b300712 Compare June 27, 2024 15:42
@vazois vazois force-pushed the vazois/allow-writes-during-migration branch from 77343b9 to 4867248 Compare June 27, 2024 21:58
@vazois vazois merged commit 001b69e into microsoft:main Jun 27, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow writing to existing keys mapped to a slot in migrating state.
2 participants