Circular locking dependency with pblk

Javier González <jg@xxxxxxxxxxx> · Thu, 5 Oct 2017 12:53:50 +0200

Hi,

lockdep is reporting a circular dependency when using XFS and pblk,
which I am a bit confused about.

This happens when XFS sends a number of nested reads and (at least) one
of them hits partially pblk's cache. In this case, pblk will retrieve
the cached lbas and form a new bio, which is submitted _synchronously_
to the media using struct completion. The original bio is then populated
with the read data.

What lockdep complains about, is that the unlocking operation in
complete() has a circular dependency with ionode->i_rwsem when they both
happen on the same core, which is different from the core that issued
wait_for_completion_io_timeout() and is waiting for the partial read.
However, AFAIU complete() happens in interrupt context, so this should
not be a problem.

The only thing I can think of is that XFS is making an assumption when
submitting direct I/O that there will be no synchronous I/O coming from
anybody underneath.

Any thoughts?

TRACE:
[ 8557.938336] ======================================================
[ 8557.944513] WARNING: possible circular locking dependency detected
[ 8557.950688] 4.14.0-rc2--509811f1f2df #726 Not tainted
[ 8557.955739] ------------------------------------------------------
[ 8557.961919] rocksdb:bg11/19129 is trying to acquire lock:
[ 8557.967319]  (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
[ 8557.975063]
               but now in release context of a crosslock acquired at the following:
[ 8557.983921]  ((complete)&wait#5){+.+.}, at: [<ffffffffad933d0a>] pblk_fill_partial_read_bio+0x22a/0x520
[ 8557.993318]
               which lock already depends on the new lock.

[ 8558.001490]
               the existing dependency chain (in reverse order) is:
[ 8558.008966]
               -> #1 ((complete)&wait#5){+.+.}:
[ 8558.014715]        __lock_acquire+0x1186/0x11d0
[ 8558.019245]        lock_acquire+0xed/0x1e0
[ 8558.023337]        wait_for_completion_io_timeout+0x65/0x180
[ 8558.028989]        pblk_fill_partial_read_bio+0x22a/0x520
[ 8558.034375]        pblk_submit_read+0x158/0x3a0
[ 8558.038900]        pblk_make_rq+0xc1/0x110
[ 8558.042992]        generic_make_request+0x11d/0x3c0
[ 8558.047862]        submit_bio+0x73/0x140
[ 8558.051781]        iomap_dio_actor+0x1f8/0x3a0
[ 8558.056216]        iomap_apply+0xa1/0x110
[ 8558.060220]        iomap_dio_rw+0x26f/0x3c0
[ 8558.064398]        xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.069269]        xfs_file_read_iter+0xab/0xc0
[ 8558.073794]        __vfs_read+0xef/0x160
[ 8558.077709]        vfs_read+0xa8/0x150
[ 8558.081454]        SyS_pread64+0x98/0xc0
[ 8558.085372]        entry_SYSCALL_64_fastpath+0x23/0xc2
[ 8558.090501]
               -> #0 (&inode->i_rwsem){++++}:
[ 8558.096067]        down_read_nested+0x49/0x70
[ 8558.100417]        xfs_ilock+0x1a1/0x230
[ 8558.104335]        xfs_file_dio_aio_read+0x74/0x1c0
[ 8558.109202]        xfs_file_read_iter+0xab/0xc0
[ 8558.113729]        __vfs_read+0xef/0x160
[ 8558.117646]
               other info that might help us debug this:

[ 8558.125636]  Possible unsafe locking scenario by crosslock:

[ 8558.132672]        CPU0                    CPU1
[ 8558.137196]        ----                    ----
[ 8558.141720]   lock(&inode->i_rwsem);
[ 8558.145291]   lock((complete)&wait#5);
[ 8558.149035]                                lock(&inode->i_rwsem);
[ 8558.155118]                                unlock((complete)&wait#5);
[ 8558.161550]
                *** DEADLOCK ***

[ 8558.167461] 4 locks held by rocksdb:bg11/19129:
[ 8558.171984]  #0:  (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
[ 8558.180148]  #1:  (rcu_read_lock){....}, at: [<ffffffffad72b8cc>] __blk_mq_run_hw_queue+0xec/0x1d0
[ 8558.189091]  #2:  (&(&nvmeq->q_lock)->rlock){-.-.}, at: [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0
[ 8558.198384]  #3:  (&x->wait#17){-.-.}, at: [<ffffffffad16444d>] complete+0x1d/0x60
[ 8558.205948]
               stack backtrace:
[ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted 4.14.0-rc2--509811f1f2df #726
[ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016
[ 8558.225675] Call Trace:
[ 8558.228123]  dump_stack+0x85/0xc7
[ 8558.231438]  print_circular_bug+0x226/0x330
[ 8558.235616]  ? print_bfs_bug+0x40/0x40
[ 8558.239358]  check_prev_add+0x432/0x840
[ 8558.243190]  ? __lock_acquire+0x736/0x11d0
[ 8558.247282]  lock_commit_crosslock+0x3d3/0x5d0
[ 8558.251718]  ? lock_commit_crosslock+0x3d3/0x5d0
[ 8558.256328]  complete+0x29/0x60
[ 8558.259469]  pblk_end_io_sync+0x12/0x20
[ 8558.263297]  nvm_end_io+0x2b/0x30
[ 8558.266607]  nvme_nvm_end_io+0x2e/0x50
[ 8558.270351]  blk_mq_end_request+0x3e/0x70
[ 8558.274360]  nvme_complete_rq+0x1c/0xd0
[ 8558.278194]  nvme_pci_complete_rq+0x7b/0x130
[ 8558.282459]  __blk_mq_complete_request+0xa3/0x160
[ 8558.287156]  blk_mq_complete_request+0x16/0x20
[ 8558.291592]  nvme_process_cq+0xf8/0x1e0
[ 8558.295424]  nvme_queue_rq+0x16e/0x9a0
[ 8558.299172]  blk_mq_dispatch_rq_list+0x19e/0x330
[ 8558.303787]  ? blk_mq_flush_busy_ctxs+0x91/0x130
[ 8558.308400]  blk_mq_sched_dispatch_requests+0x19d/0x1d0
[ 8558.313617]  __blk_mq_run_hw_queue+0x12e/0x1d0
[ 8558.318053]  __blk_mq_delay_run_hw_queue+0xb9/0xd0
[ 8558.322837]  blk_mq_run_hw_queue+0x14/0x20
[ 8558.326928]  blk_mq_sched_insert_request+0xa4/0x180
[ 8558.331797]  blk_execute_rq_nowait+0x72/0xf0
[ 8558.336061]  nvme_nvm_submit_io+0xd9/0x130
[ 8558.340151]  nvm_submit_io+0x3c/0x70
[ 8558.343723]  pblk_submit_io+0x1b/0x20
[ 8558.347379]  pblk_submit_read+0x1ec/0x3a0
[ 8558.351385]  pblk_make_rq+0xc1/0x110
[ 8558.354954]  generic_make_request+0x11d/0x3c0
[ 8558.359308]  ? iov_iter_get_pages+0xc0/0x350
[ 8558.363569]  submit_bio+0x73/0x140
[ 8558.366966]  ? submit_bio+0x73/0x140
[ 8558.370541]  ? bio_iov_iter_get_pages+0xd7/0x120
[ 8558.375147]  iomap_dio_actor+0x1f8/0x3a0
[ 8558.379066]  ? iomap_dio_zero+0x110/0x110
[ 8558.383068]  iomap_apply+0xa1/0x110
[ 8558.386553]  iomap_dio_rw+0x26f/0x3c0
[ 8558.390210]  ? iomap_dio_zero+0x110/0x110
[ 8558.394216]  ? down_read_nested+0x49/0x70
[ 8558.398219]  xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.402570]  ? xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.407094]  xfs_file_read_iter+0xab/0xc0
[ 8558.411098]  __vfs_read+0xef/0x160
[ 8558.414494]  vfs_read+0xa8/0x150
[ 8558.417718]  SyS_pread64+0x98/0xc0
[ 8558.421116]  entry_SYSCALL_64_fastpath+0x23/0xc2
[ 8558.425726] RIP: 0033:0x7f816caaed43
[ 8558.429296] RSP: 002b:00007f8150ff4d50 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 8558.436854] RAX: ffffffffffffffda RBX: 00007f8130219a10 RCX: 00007f816caaed43
[ 8558.443978] RDX: 0000000000200000 RSI: 00007f8130408000 RDI: 0000000000000114
[ 8558.451102] RBP: 00007f8150ff6110 R08: 00007f8150ff4f20 R09: 00007f8130408000
[ 8558.458226] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000114
[ 8558.465350] R13: 000000000143fb20 R14: 0000000001445fb8 R15: 00007f8150ff6550
[40409.278987] perf: interrupt took too long (6252 > 6245), lowering kernel.perf_event_max_sample_rate to 31750
[40458.049396] XFS (pblk0): Unmounting Filesystem
javigon@hero:~$
[ 8558.117646]
               other info that might help us debug this:

[ 8558.125636]  Possible unsafe locking scenario by crosslock:

[ 8558.132672]        CPU0                    CPU1
[ 8558.137196]        ----                    ----
[ 8558.141720]   lock(&inode->i_rwsem);
[ 8558.145291]   lock((complete)&wait#5);
[ 8558.149035]                                lock(&inode->i_rwsem);
[ 8558.155118]                                unlock((complete)&wait#5);
[ 8558.161550]
                *** DEADLOCK ***

[ 8558.167461] 4 locks held by rocksdb:bg11/19129:
[ 8558.171984]  #0:  (&inode->i_rwsem){++++}, at: [<ffffffffad5648d1>] xfs_ilock+0x1a1/0x230
[ 8558.180148]  #1:  (rcu_read_lock){....}, at: [<ffffffffad72b8cc>] __blk_mq_run_hw_queue+0xec/0x1d0
[ 8558.189091]  #2:  (&(&nvmeq->q_lock)->rlock){-.-.}, at: [<ffffffffad9d438c>] nvme_queue_rq+0x14c/0x9a0
[ 8558.198384]  #3:  (&x->wait#17){-.-.}, at: [<ffffffffad16444d>] complete+0x1d/0x60
[ 8558.205948]
               stack backtrace:
[ 8558.210300] CPU: 6 PID: 19129 Comm: rocksdb:bg11 Not tainted 4.14.0-rc2--509811f1f2df #726
[ 8558.218550] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016
[ 8558.225675] Call Trace:
[ 8558.228123]  dump_stack+0x85/0xc7
[ 8558.231438]  print_circular_bug+0x226/0x330
[ 8558.235616]  ? print_bfs_bug+0x40/0x40
[ 8558.239358]  check_prev_add+0x432/0x840
[ 8558.243190]  ? __lock_acquire+0x736/0x11d0
[ 8558.247282]  lock_commit_crosslock+0x3d3/0x5d0
[ 8558.251718]  ? lock_commit_crosslock+0x3d3/0x5d0
[ 8558.256328]  complete+0x29/0x60
[ 8558.259469]  pblk_end_io_sync+0x12/0x20
[ 8558.263297]  nvm_end_io+0x2b/0x30
[ 8558.266607]  nvme_nvm_end_io+0x2e/0x50
[ 8558.270351]  blk_mq_end_request+0x3e/0x70
[ 8558.274360]  nvme_complete_rq+0x1c/0xd0
[ 8558.278194]  nvme_pci_complete_rq+0x7b/0x130
[ 8558.282459]  __blk_mq_complete_request+0xa3/0x160
[ 8558.287156]  blk_mq_complete_request+0x16/0x20
[ 8558.291592]  nvme_process_cq+0xf8/0x1e0
[ 8558.295424]  nvme_queue_rq+0x16e/0x9a0
[ 8558.299172]  blk_mq_dispatch_rq_list+0x19e/0x330
[ 8558.303787]  ? blk_mq_flush_busy_ctxs+0x91/0x130
[ 8558.308400]  blk_mq_sched_dispatch_requests+0x19d/0x1d0
[ 8558.313617]  __blk_mq_run_hw_queue+0x12e/0x1d0
[ 8558.318053]  __blk_mq_delay_run_hw_queue+0xb9/0xd0
[ 8558.322837]  blk_mq_run_hw_queue+0x14/0x20
[ 8558.326928]  blk_mq_sched_insert_request+0xa4/0x180
[ 8558.331797]  blk_execute_rq_nowait+0x72/0xf0
[ 8558.336061]  nvme_nvm_submit_io+0xd9/0x130
[ 8558.340151]  nvm_submit_io+0x3c/0x70
[ 8558.343723]  pblk_submit_io+0x1b/0x20
[ 8558.347379]  pblk_submit_read+0x1ec/0x3a0
[ 8558.351385]  pblk_make_rq+0xc1/0x110
[ 8558.354954]  generic_make_request+0x11d/0x3c0
[ 8558.359308]  ? iov_iter_get_pages+0xc0/0x350
[ 8558.363569]  submit_bio+0x73/0x140
[ 8558.366966]  ? submit_bio+0x73/0x140
[ 8558.370541]  ? bio_iov_iter_get_pages+0xd7/0x120
[ 8558.375147]  iomap_dio_actor+0x1f8/0x3a0
[ 8558.379066]  ? iomap_dio_zero+0x110/0x110
[ 8558.383068]  iomap_apply+0xa1/0x110
[ 8558.386553]  iomap_dio_rw+0x26f/0x3c0
[ 8558.390210]  ? iomap_dio_zero+0x110/0x110
[ 8558.394216]  ? down_read_nested+0x49/0x70
[ 8558.398219]  xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.402570]  ? xfs_file_dio_aio_read+0x88/0x1c0
[ 8558.407094]  xfs_file_read_iter+0xab/0xc0
[ 8558.411098]  __vfs_read+0xef/0x160
[ 8558.414494]  vfs_read+0xa8/0x150
[ 8558.417718]  SyS_pread64+0x98/0xc0
[ 8558.421116]  entry_SYSCALL_64_fastpath+0x23/0xc2
[ 8558.425726] RIP: 0033:0x7f816caaed43
[ 8558.429296] RSP: 002b:00007f8150ff4d50 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 8558.436854] RAX: ffffffffffffffda RBX: 00007f8130219a10 RCX: 00007f816caaed43
[ 8558.443978] RDX: 0000000000200000 RSI: 00007f8130408000 RDI: 0000000000000114
[ 8558.451102] RBP: 00007f8150ff6110 R08: 00007f8150ff4f20 R09: 00007f8130408000
[ 8558.458226] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000114
[ 8558.465350] R13: 000000000143fb20 R14: 0000000001445fb8 R15: 00007f8150ff6550
[40409.278987] perf: interrupt took too long (6252 > 6245), lowering kernel.perf_event_max_sample_rate to 31750
[40458.049396] XFS (pblk0): Unmounting Filesystem

Thanks,
Javier

Attachment:
signature.asc

Description: Message signed with OpenPGP