Re: Circular locking dependency with pblk

Javier González <jg@xxxxxxxxxxx> · Thu, 5 Oct 2017 18:44:06 +0200

> On 5 Oct 2017, at 18.24, Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
> On 10/05/2017 04:53 AM, Javier González wrote:
>> Hi,
>> 
>> lockdep is reporting a circular dependency when using XFS and pblk,
>> which I am a bit confused about.
>> 
>> This happens when XFS sends a number of nested reads and (at least) one
>> of them hits partially pblk's cache. In this case, pblk will retrieve
>> the cached lbas and form a new bio, which is submitted _synchronously_
>> to the media using struct completion. The original bio is then populated
>> with the read data.
>> 
>> What lockdep complains about, is that the unlocking operation in
>> complete() has a circular dependency with ionode->i_rwsem when they both
>> happen on the same core, which is different from the core that issued
>> wait_for_completion_io_timeout() and is waiting for the partial read.
>> However, AFAIU complete() happens in interrupt context, so this should
>> not be a problem.
> 
> But the very trace you are posting shows the completion being down
> inline, since we catch it at submission time:
> 
>> [ 8558.256328]  complete+0x29/0x60
>> [ 8558.259469]  pblk_end_io_sync+0x12/0x20
>> [ 8558.263297]  nvm_end_io+0x2b/0x30
>> [ 8558.266607]  nvme_nvm_end_io+0x2e/0x50
>> [ 8558.270351]  blk_mq_end_request+0x3e/0x70
>> [ 8558.274360]  nvme_complete_rq+0x1c/0xd0
>> [ 8558.278194]  nvme_pci_complete_rq+0x7b/0x130
>> [ 8558.282459]  __blk_mq_complete_request+0xa3/0x160
>> [ 8558.287156]  blk_mq_complete_request+0x16/0x20
>> [ 8558.291592]  nvme_process_cq+0xf8/0x1e0
>> [ 8558.295424]  nvme_queue_rq+0x16e/0x9a0
>> [ 8558.299172]  blk_mq_dispatch_rq_list+0x19e/0x330
>> [ 8558.303787]  ? blk_mq_flush_busy_ctxs+0x91/0x130
>> [ 8558.308400]  blk_mq_sched_dispatch_requests+0x19d/0x1d0
>> [ 8558.313617]  __blk_mq_run_hw_queue+0x12e/0x1d0
>> [ 8558.318053]  __blk_mq_delay_run_hw_queue+0xb9/0xd0
>> [ 8558.322837]  blk_mq_run_hw_queue+0x14/0x20
>> [ 8558.326928]  blk_mq_sched_insert_request+0xa4/0x180
>> [ 8558.331797]  blk_execute_rq_nowait+0x72/0xf0
>> [ 8558.336061]  nvme_nvm_submit_io+0xd9/0x130
>> [ 8558.340151]  nvm_submit_io+0x3c/0x70
>> [ 8558.343723]  pblk_submit_io+0x1b/0x20> [ 8558.347379]  pblk_submit_read+0x1ec/0x3a0
> 
> [snip]
> 
> This happens since we call nvme_process_cq() after submitting IO,
> just in case there's something we can complete.
> 

Hmm. It's still interesting that the FS is allowed to take the
rw_semaphore before we get to fully complete the read bio in pblk. I'll
look into it tomorrow.

 Also, is it normal that we switch core when calling nvme_process_cq()
 on the submission path?

Javier
Attachment:
signature.asc

Description: Message signed with OpenPGP