On 2024/5/28 07:34, Chengming Zhou wrote: > On 2024/5/28 00:04, Friedrich Weber wrote: >> Hi Chengming, >> >> Thank you for taking a look at this! >> >> On 27/05/2024 07:09, Chengming Zhou wrote: >>>> I've used this reproducer for a bisect, which produced >>>> >>>> 81ada09cc25e (blk-flush: reuse rq queuelist in flush state machine) >>>> >>>> as the first commit with which I can reproduce the crashes. I'm not 100% >>>> sure it is this one because the reproducer is a bit flaky. But it does >>>> sound plausible, as the commit is included in our 6.8 kernel, and >>>> touches `queuelist` which is AFAICT where blk_flush_complete_seq >>>> dereferences the NULL pointer. >>> >>> Ok, it will be better that I can reproduce it locally, will try later. >> >> Interestingly, so far I haven't been able to reproduce the crash when >> generating IO on the host itself, I only got crashes when generating IO >> in a QEMU VM. >> >> The reproducer in more detail: > > Thanks for these details, I will try to setup and reproduce when I back to work. > >> >> - Compile Linux 6.9 with CONFIG_FAULT_INJECTION, > [...] >>> >>> BUG shows it panic on 0000000000000008, not sure what it's accessing then, >>> does it means rq->queuelist.next == 0 or something? Could you use add2line >>> to show the exact source code line that panic? I use blk_flush_complete_seq+0x296/0x2e0 >>> and get block/blk-flush.c:190, which is "fq->flush_data_in_flight++;", >>> obviously fq can't be NULL. (I'm using the v6.9 kernel) >> >> Sorry for the confusion, the crash dump was from a kernel compiled at >> 81ada09cc25e -- with 6.9, the offset seems to be different. See [2] for >> a kernel 6.9 crash dump. >> >> I don't know too much about kernel debugging, but I tried to get >> something useful out of addr2line: >> >> # addr2line -f -e /usr/lib/debug/vmlinux-6.9.0-debug2 >> blk_flush_complete_seq+0x291/0x2d0 >> __list_del >> /[...]./include/linux/list.h:195 >> >> I tried to find the relevant portions in `objdump -SD blk-flush.o`, see >> [3]. If I'm not mistaken, blk_flush_complete_seq+0x291 should point to >> >> 351: 48 89 4f 08 mov %rcx,0x8(%rdi) >> >> To me this looks like part of >> >> list_move_tail(&rq->queuelist, pending); >> >> What do you think? > > Yeah, it seems correct, so the rq->queuelist.next == NULL. It can't be NULL > if went through REQ_FSEQ_POSTFLUSH, so it must be REQ_FSEQ_PREFLUSH. It means > we allocated a request but its queuelist is not initialized or corrupted? > > Anyway, I will use below changes for debugging when reproduce, and you could > also try this to see if we could get something useful. :) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 3b4df8e5ac9e..6e3a6cd7739d 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2989,6 +2989,8 @@ void blk_mq_submit_bio(struct bio *bio) > blk_mq_use_cached_rq(rq, plug, bio); > } > > + BUG_ON(rq->queuelist.next == NULL); > + > trace_block_getrq(bio); > > rq_qos_track(q, rq, bio); > @@ -3006,6 +3008,8 @@ void blk_mq_submit_bio(struct bio *bio) > if (bio_zone_write_plugging(bio)) > blk_zone_write_plug_init_request(rq); > > + BUG_ON(rq->queuelist.next == NULL); > + > if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq)) > return; > Ah, I forgot to change to your kernel version, then should be: diff --git a/block/blk-mq.c b/block/blk-mq.c index d98654869615..908fdfb62132 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2963,6 +2963,8 @@ void blk_mq_submit_bio(struct bio *bio) return; } + BUG_ON(rq->queuelist.next == NULL); + trace_block_getrq(bio); rq_qos_track(q, rq, bio); @@ -2977,6 +2979,8 @@ void blk_mq_submit_bio(struct bio *bio) return; } + BUG_ON(rq->queuelist.next == NULL); + if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq)) return;