Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 4/30/24 17:17, Yi Zhang wrote:
On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn
<Johannes.Thumshirn@xxxxxxx> wrote:
On 30.04.24 00:18, Chaitanya Kulkarni wrote:
On 4/29/24 07:35, Johannes Thumshirn wrote:
On 23.04.24 15:18, Yi Zhang wrote:
Hi
I found this issue on the latest linux-block/for-next by blktests
nvme/tcp nvme/012, please help check it and let me know if you need
any info/testing for it, thanks.

[ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
[ 1873.761900] loop0: detected capacity change from 0 to 2097152
[ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 1874.208883] nvmet: creating nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 1874.243423] nvme nvme0: creating 48 I/O queues.
[ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
[ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420, hostnqn:
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[...]

[  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
[  327.475957] loop0: detected capacity change from 0 to 2097152
[  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1

[  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420)


[  327.872343] nvmet: creating nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.

[  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
support of multi-port devices.
seems like you don't have multipath enabled that is one difference
I can see in above log posted by Yi, and your log.

Yup, but even with multipath enabled I can't get the bug to trigger :(
It's not one 100% reproduced issue, I tried on my another server and
it cannot be reproduced.

Looking at the trace, I think I can see the issue here. In the test case, nvme-mpath fails the request upon submission as the queue is not live, and because it is a mpath request, it is failed over using nvme_failover_request, which steals the bios from the request to its private
requeue list.

The bisected patch, introduces req->bio dereference to a flush request that has no bios (stolen by the failover sequence). The reproduction seems to be related to in where in the flush sequence
the request completion is called.

I am unsure if simply making the dereference is the correct fix or not... Damien?
--
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 2f58ae018464..c17cf8ed8113 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -130,7 +130,8 @@ static void blk_flush_restore_request(struct request *rq)
         * original @rq->bio.  Restore it.
         */
        rq->bio = rq->biotail;
-       rq->__sector = rq->bio->bi_iter.bi_sector;
+       if (rq->bio)
+               rq->__sector = rq->bio->bi_iter.bi_sector;

        /* make @rq a normal request */
        rq->rq_flags &= ~RQF_FLUSH_SEQ;
--




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux