Re: Potential deadlock in PG::_scan_snaps?

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 29 Oct 2018 19:35:59 +0000 (UTC)

Hi Mykola,

Good analysis.  This reminds me of 
907b6281e99ece3677dd7b012cf4955731db6120, which I think addresses 
this issue.  In particular, it waits for applied_sync, which is after the 
write is applied but before the other callbacks fire (which may block on 
pg lock, as you note).  So I think this is resolved now in mimic...

sage

On Mon, 29 Oct 2018, Mykola Golub wrote:

> Hi,
> 
> I am investigating a "suicide timeout" during scrubbing reported by
> our users.
> 
> The cluster is running jewel 10.2.10 and a distinguish feature of it
> is that it has a cache pool for which by some reason a pg scrub
> generates many errors like this:
> 
>   osd.779 found snap mapper error on pg 56.11b oid 56:d8ba1d80:::rbd_data.924e5c33f6228e7.0000000000038670:2f3 snaps missing in mapper, should be: 2f3...repaired
> 
> And after one of such an error the scrub thread gets stack waiting for
> apply_transaction that fixes the snap mapper to return:
> 
> #0  0x00007f64d251d0ff in pthread_cond_wait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0
> #1  0x000055ba77c099eb in Cond::Wait (mutex=..., this=0x7f6490a0d640) at common/Cond.h:56
> #2  ObjectStore::apply_transactions (this=this@entry=0x55ba81fd2000, osr=osr@entry=0x55ba81ee4db0, tls=..., ondisk=ondisk@entry=0x0) at os/ObjectStore.cc:173
> #3  0x000055ba77a1c5ed in ObjectStore::apply_transaction (ondisk=0x0, t=..., osr=0x55ba81ee4db0, this=0x55ba81fd2000) at os/ObjectStore.h:1855
> #4  PG::_scan_snaps (this=this@entry=0x55ba9631a800, smap=...) at osd/PG.cc:3924
> #5  0x000055ba77a1d031 in PG::build_scrub_map_chunk (this=this@entry=0x55ba9631a800, map=..., start=..., end=..., deep=deep@entry=true, seed=seed@entry=4294967295, handle=...) at osd/PG.cc:3967
> #6  0x000055ba77a2c56d in PG::chunky_scrub (this=this@entry=0x55ba9631a800, handle=...) at osd/PG.cc:4431
> #7  0x000055ba77a2df5c in PG::scrub (this=0x55ba9631a800, queued=<optimized out>, handle=...) at osd/PG.cc:4166
> #8  0x000055ba779283c7 in PGQueueable::RunVis::operator() (this=this@entry=0x7f6490a0e820, op=...) at osd/OSD.cc:171
> 
> It looks that the transaction successfully reached FileStore::_finish_op
> and `onreadable` context (the scrub thread is waiting for) is queued
> to the filestore apply finisher. But the filestore apply finisher has
> got stuck completing a PG::C_UpdateLastRollbackInfoTrimmedToApplied
> context that wants the pg lock, which is currently being hold by the
> scrub thread:
> 
> #0  0x00007f64d251fb7c in __lll_lock_wait () from ./lib64/libpthread.so.0
> #1  0x00007f64d251a7d3 in pthread_mutex_lock () from ./lib64/libpthread.so.0
> #2  0x000055ba77f72548 in Mutex::Lock (this=this@entry=0x55ba9631ad28, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
> #3  0x000055ba779eb650 in PG::lock (this=0x55ba9631a800, no_lockdep=no_lockdep@entry=false) at osd/PG.cc:275
> #4  0x000055ba77a40f74 in PG::C_UpdateLastRollbackInfoTrimmedToApplied::finish (this=0x55ba8f3542b0) at osd/PG.h:534
> #5  0x000055ba77986199 in Context::complete (this=0x55ba8f3542b0, r=<optimized out>) at include/Context.h:64
> #6  0x000055ba779c483b in finish_contexts<Context> (cct=0x0, finished=..., result=0) at include/Context.h:120
> #7  0x000055ba77986ad9 in Context::complete (r=<optimized out>, this=0x55ba99ab8c80) at include/Context.h:64
> #8  C_ContextsBase<Context, Context>::complete (this=0x55ba99ab8c80, r=<optimized out>) at include/Context.h:194
> #9  0x000055ba77ef40a6 in Finisher::finisher_thread_entry (this=0x55ba81fd62c0) at common/Finisher.cc:68
> #10 0x00007f64d2518724 in start_thread () from ./lib64/libpthread.so.0
> #11 0x00007f64d0067e8d in clone () from ./lib64/libc.so.6
> 
> To confirm that the lock is held by the scrub thread, the holder thead id:
> 
> #fr 2
> #2  0x000055ba77f72548 in Mutex::Lock (this=this@entry=0x55ba9631ad28, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
> 110	    r = pthread_mutex_lock(&_m);
> #p/x locked_by 
> $12 = 0x7f6490a10700
> 
> And from the crash log, 0x7f6490a10700 is our scrub thread:
> 
> 7f6490a10700 -1 log_channel(cluster) log [ERR] : osd.779 found snap mapper error on pg 56.11b oid 56:d88e91fb:::rbd_data.924e5c33f6228e7.000000000003ab75:2f3 snaps missing in mapper, should be: 2f3...repaired
> 7f6484bb4700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6490a10700' had timed out after 15
> 7f64cc502700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f6490a10700' had suicide timed out after 150
> 
> And to confirm that this finisher (0x55ba81fd62c0) is the FileStore::apply_finishers[0]:
> 
> #bt
> #0  0x00007f64d251d4a8 in pthread_cond_timedwait@@GLIBC_2.3.2 () from ./lib64/libpthread.so.0
> #1  0x000055ba77cb9ac9 in Cond::WaitUntil (when=..., mutex=..., this=0x55ba81fd2530) at common/Cond.h:72
> #2  Cond::WaitInterval (this=this@entry=0x55ba81fd2530, cct=<optimized out>, mutex=..., interval=...) at common/Cond.h:81
> #3  0x000055ba77c95d8e in FileStore::sync_entry (this=0x55ba81fd2000) at os/filestore/FileStore.cc:3717
> #4  0x000055ba77cc062d in FileStore::SyncThread::entry (this=<optimized out>) at os/filestore/FileStore.h:185
> #5  0x00007f64d2518724 in start_thread () from ./lib64/libpthread.so.0
> #6  0x00007f64d0067e8d in clone () from ./lib64/libc.so.6
> #fr 3
> #3  0x000055ba77c95d8e in FileStore::sync_entry (this=0x55ba81fd2000) at os/filestore/FileStore.cc:3717
> 3717	      sync_cond.WaitInterval(g_ceph_context, lock, max_interval);
> #p *(apply_finishers._M_impl._M_start)
> $10 = (Finisher *) 0x55ba81fd62c0
> 
> So if my analysis is correct it is unsafe to call
> ObjectStore::apply_transaction in PG::_scan_snaps while holding the pg
> lock, or we should be careful not to acquire the lock in contexts that
> are completed by FileStore::apply_finishers (like
> PG::C_UpdateLastRollbackInfoTrimmedToApplied).
> 
> I am not sure if it is applied to the master, as the code diverged,
> but it looks like the same issue is at least in luminous.
> 
> I suppose a workaround for this particular user could be to try
> increase filestore_apply_finisher_threads (are there any potential
> issues with this?).
> 
> Or could temporary disabling snap trim (until all pgs are scrubbed)
> avoid triggerring PG::C_UpdateLastRollbackInfoTrimmedToApplied?
> 
> -- 
> Mykola Golub
> 
>