Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: --enable-tx-throttler can lead to crash due to race in go/vt/throttle #16102

Open
timvaillancourt opened this issue Jun 11, 2024 · 0 comments · May be fixed by #16078
Open

Bug Report: --enable-tx-throttler can lead to crash due to race in go/vt/throttle #16102

timvaillancourt opened this issue Jun 11, 2024 · 0 comments · May be fixed by #16078

Comments

@timvaillancourt
Copy link
Contributor

timvaillancourt commented Jun 11, 2024

Overview of the Issue

vttablet can crash due to concurrent iteration/writes to a map in go/vt/throttler/replication_lag_cache.go:

fatal error: concurrent map iteration and map write
goroutine 455 [running]:
vitess.io/vitess/go/vt/throttler.(*Throttler).MaxLag(0xc000bb4728?, 0xb94630?)
        vitess.io/vitess/go/vt/throttler/throttler.go:236 +0x9c
vitess.io/vitess/go/vt/vttablet/tabletserver/txthrottler.(*txThrottlerStateImpl).updateMaxLag(0xc0009f85a0)
        vitess.io/vitess/go/vt/vttablet/tabletserver/txthrottler/tx_throttler.go:399 +0x1b4
created by vitess.io/vitess/go/vt/vttablet/tabletserver/txthrottler.newTxThrottlerState in goroutine 99
        vitess.io/vitess/go/vt/vttablet/tabletserver/txthrottler/tx_throttler.go:321 +0x4f6

This crash takes a long time to occur in our production, usually after 40-90 minutes

When reading go/vt/throttler/replication_lag_cache.go the issue is clear, entries from type replicationLagCache struct has concurrent update/deletes in .add() and concurrent reads in .MaxLag()

Reproduction Steps

  1. Start vttablet with --enable-tx-throttler and --healthcheck_interval 1s. More tablets increases the likelihood of a crash
  2. Cause a high rate of traffic that will hit the Transaction Throttler, which will call .MaxLag
  3. Wait for vttablet to crash with fatal error: concurrent map iteration and map write

Binary Version

v19, or in our case: v15 with all txthrottler backports from v16-v19. The go/vt/throttler code in question is the same

Operating System and Environment details

N/A

Log Fragments

No response

@timvaillancourt timvaillancourt added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged labels Jun 11, 2024
@timvaillancourt timvaillancourt linked a pull request Jun 11, 2024 that will close this issue
5 tasks
@timvaillancourt timvaillancourt self-assigned this Jun 11, 2024
@timvaillancourt timvaillancourt added Component: Throttler Component: VTTablet and removed Needs Triage This issue needs to be correctly labelled and triaged labels Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant