Hi Mykola, Good analysis. This reminds me of 907b6281e99ece3677dd7b012cf4955731db6120, which I think addresses this issue. In particular, it waits for applied_sync, which is after the write is applied but before the other callbacks fire (which may block on pg lock, as you note). So I think this is resolved now in mimic... sage On Mon, 29 Oct 2018, Mykola Golub wrote: > Hi, > > I am investigating a "suicide timeout" during scrubbing reported by > our users. > > The cluster is running jewel 10.2.10 and a distinguish feature of it > is that it has a cache pool for which by some reason a pg scrub > generates many errors like this: > > osd.779 found snap mapper error on pg 56.11b oid 56:d8ba1d80:::rbd_data.924e5c33f6228e7.0000000000038670:2f3 snaps missing in mapper, should be: 2f3...repaired > > And after one of such an error the scrub thread gets stack waiting for > apply_transaction that fixes the snap mapper to return: > > #0 0x00007f64d251d0ff in pthread_cond_wait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0 > #1 0x000055ba77c099eb in Cond::Wait (mutex=..., this=0x7f6490a0d640) at common/Cond.h:56 > #2 ObjectStore::apply_transactions (this=this@entry=0x55ba81fd2000, osr=osr@entry=0x55ba81ee4db0, tls=..., ondisk=ondisk@entry=0x0) at os/ObjectStore.cc:173 > #3 0x000055ba77a1c5ed in ObjectStore::apply_transaction (ondisk=0x0, t=..., osr=0x55ba81ee4db0, this=0x55ba81fd2000) at os/ObjectStore.h:1855 > #4 PG::_scan_snaps (this=this@entry=0x55ba9631a800, smap=...) at osd/PG.cc:3924 > #5 0x000055ba77a1d031 in PG::build_scrub_map_chunk (this=this@entry=0x55ba9631a800, map=..., start=..., end=..., deep=deep@entry=true, seed=seed@entry=4294967295, handle=...) at osd/PG.cc:3967 > #6 0x000055ba77a2c56d in PG::chunky_scrub (this=this@entry=0x55ba9631a800, handle=...) at osd/PG.cc:4431 > #7 0x000055ba77a2df5c in PG::scrub (this=0x55ba9631a800, queued=<optimized out>, handle=...) at osd/PG.cc:4166 > #8 0x000055ba779283c7 in PGQueueable::RunVis::operator() (this=this@entry=0x7f6490a0e820, op=...) at osd/OSD.cc:171 > > It looks that the transaction successfully reached FileStore::_finish_op > and `onreadable` context (the scrub thread is waiting for) is queued > to the filestore apply finisher. But the filestore apply finisher has > got stuck completing a PG::C_UpdateLastRollbackInfoTrimmedToApplied > context that wants the pg lock, which is currently being hold by the > scrub thread: > > #0 0x00007f64d251fb7c in __lll_lock_wait () from ./lib64/libpthread.so.0 > #1 0x00007f64d251a7d3 in pthread_mutex_lock () from ./lib64/libpthread.so.0 > #2 0x000055ba77f72548 in Mutex::Lock (this=this@entry=0x55ba9631ad28, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 > #3 0x000055ba779eb650 in PG::lock (this=0x55ba9631a800, no_lockdep=no_lockdep@entry=false) at osd/PG.cc:275 > #4 0x000055ba77a40f74 in PG::C_UpdateLastRollbackInfoTrimmedToApplied::finish (this=0x55ba8f3542b0) at osd/PG.h:534 > #5 0x000055ba77986199 in Context::complete (this=0x55ba8f3542b0, r=<optimized out>) at include/Context.h:64 > #6 0x000055ba779c483b in finish_contexts<Context> (cct=0x0, finished=..., result=0) at include/Context.h:120 > #7 0x000055ba77986ad9 in Context::complete (r=<optimized out>, this=0x55ba99ab8c80) at include/Context.h:64 > #8 C_ContextsBase<Context, Context>::complete (this=0x55ba99ab8c80, r=<optimized out>) at include/Context.h:194 > #9 0x000055ba77ef40a6 in Finisher::finisher_thread_entry (this=0x55ba81fd62c0) at common/Finisher.cc:68 > #10 0x00007f64d2518724 in start_thread () from ./lib64/libpthread.so.0 > #11 0x00007f64d0067e8d in clone () from ./lib64/libc.so.6 > > To confirm that the lock is held by the scrub thread, the holder thead id: > > #fr 2 > #2 0x000055ba77f72548 in Mutex::Lock (this=this@entry=0x55ba9631ad28, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110 > 110 r = pthread_mutex_lock(&_m); > #p/x locked_by > $12 = 0x7f6490a10700 > > And from the crash log, 0x7f6490a10700 is our scrub thread: > > 7f6490a10700 -1 log_channel(cluster) log [ERR] : osd.779 found snap mapper error on pg 56.11b oid 56:d88e91fb:::rbd_data.924e5c33f6228e7.000000000003ab75:2f3 snaps missing in mapper, should be: 2f3...repaired > 7f6484bb4700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6490a10700' had timed out after 15 > 7f64cc502700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6490a10700' had suicide timed out after 150 > > And to confirm that this finisher (0x55ba81fd62c0) is the FileStore::apply_finishers[0]: > > #bt > #0 0x00007f64d251d4a8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0 > #1 0x000055ba77cb9ac9 in Cond::WaitUntil (when=..., mutex=..., this=0x55ba81fd2530) at common/Cond.h:72 > #2 Cond::WaitInterval (this=this@entry=0x55ba81fd2530, cct=<optimized out>, mutex=..., interval=...) at common/Cond.h:81 > #3 0x000055ba77c95d8e in FileStore::sync_entry (this=0x55ba81fd2000) at os/filestore/FileStore.cc:3717 > #4 0x000055ba77cc062d in FileStore::SyncThread::entry (this=<optimized out>) at os/filestore/FileStore.h:185 > #5 0x00007f64d2518724 in start_thread () from ./lib64/libpthread.so.0 > #6 0x00007f64d0067e8d in clone () from ./lib64/libc.so.6 > #fr 3 > #3 0x000055ba77c95d8e in FileStore::sync_entry (this=0x55ba81fd2000) at os/filestore/FileStore.cc:3717 > 3717 sync_cond.WaitInterval(g_ceph_context, lock, max_interval); > #p *(apply_finishers._M_impl._M_start) > $10 = (Finisher *) 0x55ba81fd62c0 > > So if my analysis is correct it is unsafe to call > ObjectStore::apply_transaction in PG::_scan_snaps while holding the pg > lock, or we should be careful not to acquire the lock in contexts that > are completed by FileStore::apply_finishers (like > PG::C_UpdateLastRollbackInfoTrimmedToApplied). > > I am not sure if it is applied to the master, as the code diverged, > but it looks like the same issue is at least in luminous. > > I suppose a workaround for this particular user could be to try > increase filestore_apply_finisher_threads (are there any potential > issues with this?). > > Or could temporary disabling snap trim (until all pgs are scrubbed) > avoid triggerring PG::C_UpdateLastRollbackInfoTrimmedToApplied? > > -- > Mykola Golub > >