On 2024/5/28 07:50, Chengming Zhou wrote: > On 2024/5/28 07:34, Chengming Zhou wrote: >> On 2024/5/28 00:04, Friedrich Weber wrote: >>> Hi Chengming, >>> >>> Thank you for taking a look at this! >>> >>> On 27/05/2024 07:09, Chengming Zhou wrote: >>>>> I've used this reproducer for a bisect, which produced >>>>> >>>>> 81ada09cc25e (blk-flush: reuse rq queuelist in flush state machine) >>>>> >>>>> as the first commit with which I can reproduce the crashes. I'm not 100% >>>>> sure it is this one because the reproducer is a bit flaky. But it does >>>>> sound plausible, as the commit is included in our 6.8 kernel, and >>>>> touches `queuelist` which is AFAICT where blk_flush_complete_seq >>>>> dereferences the NULL pointer. >>>> >>>> Ok, it will be better that I can reproduce it locally, will try later. >>> >>> Interestingly, so far I haven't been able to reproduce the crash when >>> generating IO on the host itself, I only got crashes when generating IO >>> in a QEMU VM. >>> >>> The reproducer in more detail: >> >> Thanks for these details, I will try to setup and reproduce when I back to work. >> >>> >>> - Compile Linux 6.9 with CONFIG_FAULT_INJECTION, >> [...] >>>> >>>> BUG shows it panic on 0000000000000008, not sure what it's accessing then, >>>> does it means rq->queuelist.next == 0 or something? Could you use add2line >>>> to show the exact source code line that panic? I use blk_flush_complete_seq+0x296/0x2e0 >>>> and get block/blk-flush.c:190, which is "fq->flush_data_in_flight++;", >>>> obviously fq can't be NULL. (I'm using the v6.9 kernel) >>> >>> Sorry for the confusion, the crash dump was from a kernel compiled at >>> 81ada09cc25e -- with 6.9, the offset seems to be different. See [2] for >>> a kernel 6.9 crash dump. >>> >>> I don't know too much about kernel debugging, but I tried to get >>> something useful out of addr2line: >>> >>> # addr2line -f -e /usr/lib/debug/vmlinux-6.9.0-debug2 >>> blk_flush_complete_seq+0x291/0x2d0 >>> __list_del >>> /[...]./include/linux/list.h:195 >>> >>> I tried to find the relevant portions in `objdump -SD blk-flush.o`, see >>> [3]. If I'm not mistaken, blk_flush_complete_seq+0x291 should point to >>> >>> 351: 48 89 4f 08 mov %rcx,0x8(%rdi) >>> >>> To me this looks like part of >>> >>> list_move_tail(&rq->queuelist, pending); >>> >>> What do you think? >> >> Yeah, it seems correct, so the rq->queuelist.next == NULL. It can't be NULL >> if went through REQ_FSEQ_POSTFLUSH, so it must be REQ_FSEQ_PREFLUSH. It means >> we allocated a request but its queuelist is not initialized or corrupted? >> >> Anyway, I will use below changes for debugging when reproduce, and you could >> also try this to see if we could get something useful. :) >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 3b4df8e5ac9e..6e3a6cd7739d 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2989,6 +2989,8 @@ void blk_mq_submit_bio(struct bio *bio) >> blk_mq_use_cached_rq(rq, plug, bio); >> } >> >> + BUG_ON(rq->queuelist.next == NULL); >> + >> trace_block_getrq(bio); >> >> rq_qos_track(q, rq, bio); >> @@ -3006,6 +3008,8 @@ void blk_mq_submit_bio(struct bio *bio) >> if (bio_zone_write_plugging(bio)) >> blk_zone_write_plug_init_request(rq); >> >> + BUG_ON(rq->queuelist.next == NULL); >> + >> if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq)) >> return; >> > > Ah, I forgot to change to your kernel version, then should be: > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index d98654869615..908fdfb62132 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2963,6 +2963,8 @@ void blk_mq_submit_bio(struct bio *bio) > return; > } > > + BUG_ON(rq->queuelist.next == NULL); > + > trace_block_getrq(bio); > > rq_qos_track(q, rq, bio); > @@ -2977,6 +2979,8 @@ void blk_mq_submit_bio(struct bio *bio) > return; > } > > + BUG_ON(rq->queuelist.next == NULL); > + > if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq)) > return; > Another possibility is that drivers may change rq->queuelist even after rq->end_io(). So add two more BUG_ON() to detect this: diff --git a/block/blk-flush.c b/block/blk-flush.c index e73dc22d05c1..0eb684a468e5 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -179,7 +179,10 @@ static void blk_flush_complete_seq(struct request *rq, switch (seq) { case REQ_FSEQ_PREFLUSH: + BUG_ON(rq->queuelist.next == NULL); + fallthrough; case REQ_FSEQ_POSTFLUSH: + BUG_ON(rq->queuelist.next == NULL); /* queue for flush */ if (list_empty(pending)) fq->flush_pending_since = jiffies; diff --git a/block/blk-mq.c b/block/blk-mq.c index d98654869615..908fdfb62132 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2963,6 +2963,8 @@ void blk_mq_submit_bio(struct bio *bio) return; } + BUG_ON(rq->queuelist.next == NULL); + trace_block_getrq(bio); rq_qos_track(q, rq, bio); @@ -2977,6 +2979,8 @@ void blk_mq_submit_bio(struct bio *bio) return; } + BUG_ON(rq->queuelist.next == NULL); + if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq)) return;