Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] Redis 7.2.x crashes in activeDefragCycle when activedefrag disabled while running and re-enabled #13307

Closed
stevelipinski opened this issue May 30, 2024 · 3 comments · Fixed by #13315

Comments

@stevelipinski
Copy link
Contributor

Discovered during investigation of issue 13205

Crash report

Paste the complete crash log between the quotes below. Please include a few lines from the log preceding the crash report to provide some context.

=== REDIS BUG REPORT START: Cut & paste starting from here ===
19766:M 29 May 2024 14:29:44.134 # Redis 7.2.3 crashed by signal: 11, si_code: 1
19766:M 29 May 2024 14:29:44.134 # Accessing address: 0x48
19766:M 29 May 2024 14:29:44.134 # Crashed running the instruction at: 0x48a9bc

------ STACK TRACE ------
EIP:
redis-server *:63791 [cluster](defragLaterStep+0x4c)[0x48a9bc]

Backtrace:
/lib64/libpthread.so.0(+0x12cf0)[0x7f11c991fcf0]
redis-server *:63791 [cluster](defragLaterStep+0x4c)[0x48a9bc]
redis-server *:63791 [cluster](activeDefragCycle+0x3b4)[0x48b0f4]
redis-server *:63791 [cluster](databasesCron+0x6c)[0x5668ec]
redis-server *:63791 [cluster](serverCron+0x64a)[0x5691da]
redis-server *:63791 [cluster][0x56248d]
redis-server *:63791 [cluster](aeMain+0x1d8)[0x563c58]
redis-server *:63791 [cluster](main+0x39a)[0x450d4a]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f11c9582d85]
redis-server *:63791 [cluster](_start+0x2e)[0x45147e]

------ REGISTERS ------
19766:M 29 May 2024 14:29:44.135 #
RAX:0000000000000000 RBX:0000000000000000
RCX:0000000000000000 RDX:0000000000000000
RDI:0000000000000000 RSI:0000000000000000
RBP:0000000000000000 RSP:00007ffe5c2adab0
R8 :000000000036c9c2 R9 :00007ffe5c34a080
R10:00007ffe5c2adb30 R11:0000000000000002
R12:00000000000002c0 R13:0006199bef354748
R14:0000000000000000 R15:0000000000000000
RIP:000000000048a9bc EFL:0000000000010246
CSGSFS:002b000000000033
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adabf) -> 0000000000000000
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adabe) -> 0006199bef354748
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adabd) -> 0000000000000000
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adabc) -> 0000000000000014
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adabb) -> 0000000000020d20
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adaba) -> 0000000066577418
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab9) -> 0000000000481daf
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab8) -> 00000000000002c0
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab7) -> 0000000000000000
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab6) -> 000000050a98c3d4
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab5) -> 0000000000054e20
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab4) -> 000000050a98c3d4
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab3) -> 00000000004809c0
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab2) -> 00000000004808d0
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab1) -> 0000000000000001
19766:M 29 May 2024 14:29:44.135 # (00007ffe5c2adab0) -> 0000000000000000
...

------ DUMPING CODE AROUND EIP ------
Symbol: defragLaterStep (base: 0x48a970)
Module: redis-server *:63791 [cluster] (base 0x400000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x48a970 -D -b binary -m i386:x86-64 /tmp/dump.bin
------
19766:M 29 May 2024 14:29:44.272 # dump of function (hexdump of 204 bytes):
41574531ff41564989fe41554989f5415455534883ec68488b2d92734600f30f7e0d127c19004c8b25a3734600488b15844c46000f160d657b19000f294c2410488b35794c46004885d2755a498b7e48488b074885f6742f483970100f85510300004889c6e8d6870d00498b464848c7053f4c46000000000048c7053c4c460000000000488b004885c00f841c03000048c7051d4c460000000000488b70104889351a4c4600498b3ee852d00d004889c3488b05f87246004889442420e9b500000066662e0f1f8400000000
Function at 0x5631b0 is listDelNode
Function at 0x567a70 is dictFind

Additional information

  • Running in cluster mode
  • loaded up with simple string keys with random (short) TTLs
  • while defrag is running, CONFIG SET activedefrag off; CONFIG SET activedefrag on

fix is to reset expires_counter during the disabled mid-run if block in defrag.c:activeDefragCycle()

@stevelipinski
Copy link
Contributor Author

Fix:
stevelipinski@ecb7cd8

@sundb
Copy link
Collaborator

sundb commented May 31, 2024

@stevelipinski thanks, you can create a PR instead of issue to fix it.

sundb added a commit that referenced this issue Jun 18, 2024
… reset (#13315)

this PR fixes two crashes:

1. Fix missing slotToKeyInit() when using `flushdb async` under cluster
mode.
    #13205

2. Fix missing expires_cursor reset when stopping active defrag in the
middle of defragment.
    #13307
If we stop active defrag in the middle of defragging db->expires, if
`expires_cursor` is not reset to 0, the next time we enable active
defrag again, defragLaterStep(db, ...) will be entered. However, at this
time, `db` has been reset to NULL, which results in crash.

The affected code were removed by #11695 and #13058 in usntable, so we
just need backport this to 7.2.
@sundb
Copy link
Collaborator

sundb commented Jun 21, 2024

Fixed via #13315

@sundb sundb closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

2 participants