Re: Circular locking dependency with pblk

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 6 Oct 2017 10:36:09 +1100

On Thu, Oct 05, 2017 at 12:53:50PM +0200, Javier González wrote:
> Hi,
> 
> lockdep is reporting a circular dependency when using XFS and pblk,
> which I am a bit confused about.
> 
> This happens when XFS sends a number of nested reads and (at least) one
> of them hits partially pblk's cache. In this case, pblk will retrieve
> the cached lbas and form a new bio, which is submitted _synchronously_
> to the media using struct completion. The original bio is then populated
> with the read data.
> 
> What lockdep complains about, is that the unlocking operation in
> complete() has a circular dependency with ionode->i_rwsem when they both
> happen on the same core, which is different from the core that issued
> wait_for_completion_io_timeout() and is waiting for the partial read.
> However, AFAIU complete() happens in interrupt context, so this should
> not be a problem.
> 
> The only thing I can think of is that XFS is making an assumption when
> submitting direct I/O that there will be no synchronous I/O coming from
> anybody underneath.
> 
> Any thoughts?

The new lockdep completion tracking that was included in 4.14 can
throw false positives because it doesn't (yet) have the capability
to understand the multiply layered completion contexts we have in
the storage stack.

Peter, here's another for you to ponder....

> TRACE:
> [ 8557.938336] ======================================================
> [ 8557.944513] WARNING: possible circular locking dependency detected
> [ 8557.950688] 4.14.0-rc2--509811f1f2df #726 Not tainted
> [ 8557.955739] ------------------------------------------------------
> [ 8557.961919] rocksdb:bg11/19129 is trying to acquire lock:
> [ 8557.967319]  (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
> [ 8557.975063]
>                but now in release context of a crosslock acquired at the following:
> [ 8557.983921]  ((complete)&wait#5){+.+.}, at: [<ffffffffad933d0a>] pblk_fill_partial_read_bio+0x22a/0x520
> [ 8557.993318]
>                which lock already depends on the new lock.
> 
> [ 8558.001490]
>                the existing dependency chain (in reverse order) is:
> [ 8558.008966]
>                -> #1 ((complete)&wait#5){+.+.}:
> [ 8558.014715]        __lock_acquire+0x1186/0x11d0
> [ 8558.019245]        lock_acquire+0xed/0x1e0
> [ 8558.023337]        wait_for_completion_io_timeout+0x65/0x180
> [ 8558.028989]        pblk_fill_partial_read_bio+0x22a/0x520
> [ 8558.034375]        pblk_submit_read+0x158/0x3a0
> [ 8558.038900]        pblk_make_rq+0xc1/0x110
> [ 8558.042992]        generic_make_request+0x11d/0x3c0
> [ 8558.047862]        submit_bio+0x73/0x140
> [ 8558.051781]        iomap_dio_actor+0x1f8/0x3a0
> [ 8558.056216]        iomap_apply+0xa1/0x110
> [ 8558.060220]        iomap_dio_rw+0x26f/0x3c0
> [ 8558.064398]        xfs_file_dio_aio_read+0x88/0x1c0
> [ 8558.069269]        xfs_file_read_iter+0xab/0xc0
> [ 8558.073794]        __vfs_read+0xef/0x160
> [ 8558.077709]        vfs_read+0xa8/0x150
> [ 8558.081454]        SyS_pread64+0x98/0xc0
> [ 8558.085372]        entry_SYSCALL_64_fastpath+0x23/0xc2

This is completely normal, blocked on a bio completion with a
filesystem lock held. More commonly we see this sort of thing
through through submit_bio_wait().

> [ 8558.125636]  Possible unsafe locking scenario by crosslock:
> 
> [ 8558.132672]        CPU0                    CPU1
> [ 8558.137196]        ----                    ----
> [ 8558.141720]   lock(&inode->i_rwsem);
> [ 8558.145291]   lock((complete)&wait#5);
> [ 8558.149035]                                lock(&inode->i_rwsem);
> [ 8558.155118]                                unlock((complete)&wait#5);
> [ 8558.161550]
>                 *** DEADLOCK ***
> 
> [ 8558.167461] 4 locks held by rocksdb:bg11/19129:
> [ 8558.171984]  #0:  (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
> [ 8558.180148]  #1:  (rcu_read_lock){....}, at: [<ffffffffad72b8cc>] __blk_mq_run_hw_queue+0xec/0x1d0
> [ 8558.189091]  #2:  (&(&nvmeq->q_lock)->rlock){-.-.}, at: [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0
> [ 8558.198384]  #3:  (&x->wait#17){-.-.}, at: [<ffffffffad16444d>] complete+0x1d/0x60
> [ 8558.205948]
>                stack backtrace:
> [ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted 4.14.0-rc2--509811f1f2df #726
> [ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016
> [ 8558.225675] Call Trace:
> [ 8558.228123]  dump_stack+0x85/0xc7
> [ 8558.231438]  print_circular_bug+0x226/0x330
> [ 8558.235616]  ? print_bfs_bug+0x40/0x40
> [ 8558.239358]  check_prev_add+0x432/0x840
> [ 8558.243190]  ? __lock_acquire+0x736/0x11d0
> [ 8558.247282]  lock_commit_crosslock+0x3d3/0x5d0
> [ 8558.251718]  ? lock_commit_crosslock+0x3d3/0x5d0
> [ 8558.256328]  complete+0x29/0x60
> [ 8558.259469]  pblk_end_io_sync+0x12/0x20
> [ 8558.263297]  nvm_end_io+0x2b/0x30
> [ 8558.266607]  nvme_nvm_end_io+0x2e/0x50

And this is where lockdep gets confused. Up until this point, it's
only ever seen the bio completion come from /outside/ the XFS
i_rwsem context (i.e. async completion form interupts). With a
synchronous completion like this, the completion has now been done
from inside the i_rwsem context and that is what triggers lockdep.

This is safe to do because even if we have a bio blocked on a
completion queue, we've got all we way down to this complete() call
from a *different bio context* inside a read locked i_rwsem. Hence
there's no way this can be a deadlock even if the two processes are
holding the same i_rwsem.

Therefore: false positive.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx