Hi, I am investigating a "suicide timeout" during scrubbing reported by our users. The cluster is running jewel 10.2.10 and a distinguish feature of it is that it has a cache pool for which by some reason a pg scrub generates many errors like this: osd.779 found snap mapper error on pg 56.11b oid 56:d8ba1d80:::rbd_data.924e5c33f6228e7.0000000000038670:2f3 snaps missing in mapper, should be: 2f3...repaired And after one of such an error the scrub thread gets stack waiting for apply_transaction that fixes the snap mapper to return: #0 0x00007f64d251d0ff in pthread_cond_wait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0 #1 0x000055ba77c099eb in Cond::Wait (mutex=..., this=0x7f6490a0d640) at common/Cond.h:56 #2 ObjectStore::apply_transactions (this=this@entry=0x55ba81fd2000, osr=osr@entry=0x55ba81ee4db0, tls=..., ondisk=ondisk@entry=0x0) at os/ObjectStore.cc:173 #3 0x000055ba77a1c5ed in ObjectStore::apply_transaction (ondisk=0x0, t=..., osr=0x55ba81ee4db0, this=0x55ba81fd2000) at os/ObjectStore.h:1855 #4 PG::_scan_snaps (this=this@entry=0x55ba9631a800, smap=...) at osd/PG.cc:3924 #5 0x000055ba77a1d031 in PG::build_scrub_map_chunk (this=this@entry=0x55ba9631a800, map=..., start=..., end=..., deep=deep@entry=true, seed=seed@entry=4294967295, handle=...) at osd/PG.cc:3967 #6 0x000055ba77a2c56d in PG::chunky_scrub (this=this@entry=0x55ba9631a800, handle=...) at osd/PG.cc:4431 #7 0x000055ba77a2df5c in PG::scrub (this=0x55ba9631a800, queued=<optimized out>, handle=...) at osd/PG.cc:4166 #8 0x000055ba779283c7 in PGQueueable::RunVis::operator() (this=this@entry=0x7f6490a0e820, op=...) at osd/OSD.cc:171 It looks that the transaction successfully reached FileStore::_finish_op and `onreadable` context (the scrub thread is waiting for) is queued to the filestore apply finisher. But the filestore apply finisher has got stuck completing a PG::C_UpdateLastRollbackInfoTrimmedToApplied context that wants the pg lock, which is currently being hold by the scrub thread: #0 0x00007f64d251fb7c in __lll_lock_wait () from ./lib64/libpthread.so.0 #1 0x00007f64d251a7d3 in pthread_mutex_lock () from ./lib64/libpthread.so.0 #2 0x000055ba77f72548 in Mutex::Lock (this=this@entry=0x55ba9631ad28, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 #3 0x000055ba779eb650 in PG::lock (this=0x55ba9631a800, no_lockdep=no_lockdep@entry=false) at osd/PG.cc:275 #4 0x000055ba77a40f74 in PG::C_UpdateLastRollbackInfoTrimmedToApplied::finish (this=0x55ba8f3542b0) at osd/PG.h:534 #5 0x000055ba77986199 in Context::complete (this=0x55ba8f3542b0, r=<optimized out>) at include/Context.h:64 #6 0x000055ba779c483b in finish_contexts<Context> (cct=0x0, finished=..., result=0) at include/Context.h:120 #7 0x000055ba77986ad9 in Context::complete (r=<optimized out>, this=0x55ba99ab8c80) at include/Context.h:64 #8 C_ContextsBase<Context, Context>::complete (this=0x55ba99ab8c80, r=<optimized out>) at include/Context.h:194 #9 0x000055ba77ef40a6 in Finisher::finisher_thread_entry (this=0x55ba81fd62c0) at common/Finisher.cc:68 #10 0x00007f64d2518724 in start_thread () from ./lib64/libpthread.so.0 #11 0x00007f64d0067e8d in clone () from ./lib64/libc.so.6 To confirm that the lock is held by the scrub thread, the holder thead id: #fr 2 #2 0x000055ba77f72548 in Mutex::Lock (this=this@entry=0x55ba9631ad28, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 110 r = pthread_mutex_lock(&_m); #p/x locked_by $12 = 0x7f6490a10700 And from the crash log, 0x7f6490a10700 is our scrub thread: 7f6490a10700 -1 log_channel(cluster) log [ERR] : osd.779 found snap mapper error on pg 56.11b oid 56:d88e91fb:::rbd_data.924e5c33f6228e7.000000000003ab75:2f3 snaps missing in mapper, should be: 2f3...repaired 7f6484bb4700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6490a10700' had timed out after 15 7f64cc502700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6490a10700' had suicide timed out after 150 And to confirm that this finisher (0x55ba81fd62c0) is the FileStore::apply_finishers[0]: #bt #0 0x00007f64d251d4a8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0 #1 0x000055ba77cb9ac9 in Cond::WaitUntil (when=..., mutex=..., this=0x55ba81fd2530) at common/Cond.h:72 #2 Cond::WaitInterval (this=this@entry=0x55ba81fd2530, cct=<optimized out>, mutex=..., interval=...) at common/Cond.h:81 #3 0x000055ba77c95d8e in FileStore::sync_entry (this=0x55ba81fd2000) at os/filestore/FileStore.cc:3717 #4 0x000055ba77cc062d in FileStore::SyncThread::entry (this=<optimized out>) at os/filestore/FileStore.h:185 #5 0x00007f64d2518724 in start_thread () from ./lib64/libpthread.so.0 #6 0x00007f64d0067e8d in clone () from ./lib64/libc.so.6 #fr 3 #3 0x000055ba77c95d8e in FileStore::sync_entry (this=0x55ba81fd2000) at os/filestore/FileStore.cc:3717 3717 sync_cond.WaitInterval(g_ceph_context, lock, max_interval); #p *(apply_finishers._M_impl._M_start) $10 = (Finisher *) 0x55ba81fd62c0 So if my analysis is correct it is unsafe to call ObjectStore::apply_transaction in PG::_scan_snaps while holding the pg lock, or we should be careful not to acquire the lock in contexts that are completed by FileStore::apply_finishers (like PG::C_UpdateLastRollbackInfoTrimmedToApplied). I am not sure if it is applied to the master, as the code diverged, but it looks like the same issue is at least in luminous. I suppose a workaround for this particular user could be to try increase filestore_apply_finisher_threads (are there any potential issues with this?). Or could temporary disabling snap trim (until all pgs are scrubbed) avoid triggerring PG::C_UpdateLastRollbackInfoTrimmedToApplied? -- Mykola Golub