> On 6 Oct 2017, at 01.36, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Thu, Oct 05, 2017 at 12:53:50PM +0200, Javier González wrote: >> Hi, >> >> lockdep is reporting a circular dependency when using XFS and pblk, >> which I am a bit confused about. >> >> This happens when XFS sends a number of nested reads and (at least) one >> of them hits partially pblk's cache. In this case, pblk will retrieve >> the cached lbas and form a new bio, which is submitted _synchronously_ >> to the media using struct completion. The original bio is then populated >> with the read data. >> >> What lockdep complains about, is that the unlocking operation in >> complete() has a circular dependency with ionode->i_rwsem when they both >> happen on the same core, which is different from the core that issued >> wait_for_completion_io_timeout() and is waiting for the partial read. >> However, AFAIU complete() happens in interrupt context, so this should >> not be a problem. >> >> The only thing I can think of is that XFS is making an assumption when >> submitting direct I/O that there will be no synchronous I/O coming from >> anybody underneath. >> >> Any thoughts? > > The new lockdep completion tracking that was included in 4.14 can > throw false positives because it doesn't (yet) have the capability > to understand the multiply layered completion contexts we have in > the storage stack. > > Peter, here's another for you to ponder.... > >> TRACE: >> [ 8557.938336] ====================================================== >> [ 8557.944513] WARNING: possible circular locking dependency detected >> [ 8557.950688] 4.14.0-rc2--509811f1f2df #726 Not tainted >> [ 8557.955739] ------------------------------------------------------ >> [ 8557.961919] rocksdb:bg11/19129 is trying to acquire lock: >> [ 8557.967319] (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230 >> [ 8557.975063] >> but now in release context of a crosslock acquired at the following: >> [ 8557.983921] ((complete)&wait#5){+.+.}, at: [<ffffffffad933d0a>] pblk_fill_partial_read_bio+0x22a/0x520 >> [ 8557.993318] >> which lock already depends on the new lock. >> >> [ 8558.001490] >> the existing dependency chain (in reverse order) is: >> [ 8558.008966] >> -> #1 ((complete)&wait#5){+.+.}: >> [ 8558.014715] __lock_acquire+0x1186/0x11d0 >> [ 8558.019245] lock_acquire+0xed/0x1e0 >> [ 8558.023337] wait_for_completion_io_timeout+0x65/0x180 >> [ 8558.028989] pblk_fill_partial_read_bio+0x22a/0x520 >> [ 8558.034375] pblk_submit_read+0x158/0x3a0 >> [ 8558.038900] pblk_make_rq+0xc1/0x110 >> [ 8558.042992] generic_make_request+0x11d/0x3c0 >> [ 8558.047862] submit_bio+0x73/0x140 >> [ 8558.051781] iomap_dio_actor+0x1f8/0x3a0 >> [ 8558.056216] iomap_apply+0xa1/0x110 >> [ 8558.060220] iomap_dio_rw+0x26f/0x3c0 >> [ 8558.064398] xfs_file_dio_aio_read+0x88/0x1c0 >> [ 8558.069269] xfs_file_read_iter+0xab/0xc0 >> [ 8558.073794] __vfs_read+0xef/0x160 >> [ 8558.077709] vfs_read+0xa8/0x150 >> [ 8558.081454] SyS_pread64+0x98/0xc0 >> [ 8558.085372] entry_SYSCALL_64_fastpath+0x23/0xc2 > > This is completely normal, blocked on a bio completion with a > filesystem lock held. More commonly we see this sort of thing > through through submit_bio_wait(). Yes. We do the same construct in pblk. > >> [ 8558.125636] Possible unsafe locking scenario by crosslock: >> >> [ 8558.132672] CPU0 CPU1 >> [ 8558.137196] ---- ---- >> [ 8558.141720] lock(&inode->i_rwsem); >> [ 8558.145291] lock((complete)&wait#5); >> [ 8558.149035] lock(&inode->i_rwsem); >> [ 8558.155118] unlock((complete)&wait#5); >> [ 8558.161550] >> *** DEADLOCK *** >> >> [ 8558.167461] 4 locks held by rocksdb:bg11/19129: >> [ 8558.171984] #0: (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230 >> [ 8558.180148] #1: (rcu_read_lock){....}, at: [<ffffffffad72b8cc>] __blk_mq_run_hw_queue+0xec/0x1d0 >> [ 8558.189091] #2: (&(&nvmeq->q_lock)->rlock){-.-.}, at: [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0 >> [ 8558.198384] #3: (&x->wait#17){-.-.}, at: [<ffffffffad16444d>] complete+0x1d/0x60 >> [ 8558.205948] >> stack backtrace: >> [ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted 4.14.0-rc2--509811f1f2df #726 >> [ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016 >> [ 8558.225675] Call Trace: >> [ 8558.228123] dump_stack+0x85/0xc7 >> [ 8558.231438] print_circular_bug+0x226/0x330 >> [ 8558.235616] ? print_bfs_bug+0x40/0x40 >> [ 8558.239358] check_prev_add+0x432/0x840 >> [ 8558.243190] ? __lock_acquire+0x736/0x11d0 >> [ 8558.247282] lock_commit_crosslock+0x3d3/0x5d0 >> [ 8558.251718] ? lock_commit_crosslock+0x3d3/0x5d0 >> [ 8558.256328] complete+0x29/0x60 >> [ 8558.259469] pblk_end_io_sync+0x12/0x20 >> [ 8558.263297] nvm_end_io+0x2b/0x30 >> [ 8558.266607] nvme_nvm_end_io+0x2e/0x50 > > And this is where lockdep gets confused. Up until this point, it's > only ever seen the bio completion come from /outside/ the XFS > i_rwsem context (i.e. async completion form interupts). With a > synchronous completion like this, the completion has now been done > from inside the i_rwsem context and that is what triggers lockdep. > > This is safe to do because even if we have a bio blocked on a > completion queue, we've got all we way down to this complete() call > from a *different bio context* inside a read locked i_rwsem. Hence > there's no way this can be a deadlock even if the two processes are > holding the same i_rwsem. > > Therefore: false positive. Thanks for the explanation Dave. I could not see how waiting for a struct complete under generic_make_request could in anyway interleave with a FS lock, but it is good to have it confirmed. Javier.
Attachment:
signature.asc
Description: Message signed with OpenPGP