> On 5 Oct 2017, at 18.24, Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 10/05/2017 04:53 AM, Javier González wrote: >> Hi, >> >> lockdep is reporting a circular dependency when using XFS and pblk, >> which I am a bit confused about. >> >> This happens when XFS sends a number of nested reads and (at least) one >> of them hits partially pblk's cache. In this case, pblk will retrieve >> the cached lbas and form a new bio, which is submitted _synchronously_ >> to the media using struct completion. The original bio is then populated >> with the read data. >> >> What lockdep complains about, is that the unlocking operation in >> complete() has a circular dependency with ionode->i_rwsem when they both >> happen on the same core, which is different from the core that issued >> wait_for_completion_io_timeout() and is waiting for the partial read. >> However, AFAIU complete() happens in interrupt context, so this should >> not be a problem. > > But the very trace you are posting shows the completion being down > inline, since we catch it at submission time: > >> [ 8558.256328] complete+0x29/0x60 >> [ 8558.259469] pblk_end_io_sync+0x12/0x20 >> [ 8558.263297] nvm_end_io+0x2b/0x30 >> [ 8558.266607] nvme_nvm_end_io+0x2e/0x50 >> [ 8558.270351] blk_mq_end_request+0x3e/0x70 >> [ 8558.274360] nvme_complete_rq+0x1c/0xd0 >> [ 8558.278194] nvme_pci_complete_rq+0x7b/0x130 >> [ 8558.282459] __blk_mq_complete_request+0xa3/0x160 >> [ 8558.287156] blk_mq_complete_request+0x16/0x20 >> [ 8558.291592] nvme_process_cq+0xf8/0x1e0 >> [ 8558.295424] nvme_queue_rq+0x16e/0x9a0 >> [ 8558.299172] blk_mq_dispatch_rq_list+0x19e/0x330 >> [ 8558.303787] ? blk_mq_flush_busy_ctxs+0x91/0x130 >> [ 8558.308400] blk_mq_sched_dispatch_requests+0x19d/0x1d0 >> [ 8558.313617] __blk_mq_run_hw_queue+0x12e/0x1d0 >> [ 8558.318053] __blk_mq_delay_run_hw_queue+0xb9/0xd0 >> [ 8558.322837] blk_mq_run_hw_queue+0x14/0x20 >> [ 8558.326928] blk_mq_sched_insert_request+0xa4/0x180 >> [ 8558.331797] blk_execute_rq_nowait+0x72/0xf0 >> [ 8558.336061] nvme_nvm_submit_io+0xd9/0x130 >> [ 8558.340151] nvm_submit_io+0x3c/0x70 >> [ 8558.343723] pblk_submit_io+0x1b/0x20> [ 8558.347379] pblk_submit_read+0x1ec/0x3a0 > > [snip] > > This happens since we call nvme_process_cq() after submitting IO, > just in case there's something we can complete. > Hmm. It's still interesting that the FS is allowed to take the rw_semaphore before we get to fully complete the read bio in pblk. I'll look into it tomorrow. Also, is it normal that we switch core when calling nvme_process_cq() on the submission path? Javier
Attachment:
signature.asc
Description: Message signed with OpenPGP