On 05/06/2024 12:30, Chengming Zhou wrote: > On 2024/6/5 16:45, Friedrich Weber wrote: >> Hi, >> >> On 04/06/2024 08:47, Chengming Zhou wrote: >>> Friedrich Weber reported a kernel crash problem and bisected to commit >>> 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine"). >>> >>> The root cause is that we use "list_move_tail(&rq->queuelist, pending)" >>> in the PREFLUSH/POSTFLUSH sequences. But rq->queuelist.next == xxx since >>> it's popped out from plug->cached_rq in __blk_mq_alloc_requests_batch(). >>> We don't initialize its queuelist just for this first request, although >>> the queuelist of all later popped requests will be initialized. >>> >>> Fix it by changing to use "list_add_tail(&rq->queuelist, pending)" so >>> rq->queuelist doesn't need to be initialized. It should be ok since rq >>> can't be on any list when PREFLUSH or POSTFLUSH, has no move actually. >>> >>> Please note the commit 81ada09cc25e ("blk-flush: reuse rq queuelist in >>> flush state machine") also has another requirement that no drivers would >>> touch rq->queuelist after blk_mq_end_request() since we will reuse it to >>> add rq to the post-flush pending list in POSTFLUSH. If this is not true, >>> we will have to revert that commit IMHO. >> >> Unfortunately, with this patch applied to kernel 6.9 I get a different >> crash [2] on a Debian 12 (virtual) machine with root on LVM on boot (no >> software RAID involved). See [1] for lsblk and findmnt output. addr2line >> says: > > Sorry, which commit is your kernel? Is mainline tag v6.9 or at some commit? Yes, by "kernel 6.9" I meant mainline tag v6.9, so commit a38297e3fb01. If I boot this mainline kernel v6.9 in a Debian (virtual) machine with root on LVM, I do not get a crash. If I apply the patch "block: fix request.queuelist usage in flush" on top of this mainline kernel v6.9, and boot the Debian machine into that patched kernel, I get a crash on boot. > And is it reproducible using the mainline kernel v6.10-rc2? I'll test mainline kernel v6.10-rc2, and "block: fix request.queuelist usage in flush" applied on top of v6.10-rc2, and get back to you. >> # addr2line -f -e /usr/lib/debug/vmlinux-6.9.0-patch0604-nodebuglist+ >> blk_mq_request_bypass_insert+0x20 > > I think here should use blk_mq_insert_request+0x120, instead of the > blk_mq_request_bypass_insert+0x20, which has "?" at the beginning. > Right, sorry: # addr2line -f -e /usr/lib/debug/vmlinux-6.9.0-patch0604-nodebuglist+ blk_mq_insert_request+0x120 blk_mq_insert_request [...]/linux/block/blk-mq.c:2539 which refers to this line [1]: blk_mq_request_bypass_insert(rq, BLK_MQ_INSERT_AT_HEAD); Thanks! Friedrich [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/blk-mq.c?h=v6.9#n2539