On Tue, Feb 11, 2025 at 8:29 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > On Tue, Feb 11, 2025 at 08:13:16PM +0800, Ming Lei wrote: > > On Fri, Feb 07, 2025 at 07:09:39PM -0700, Cheyenne Wills wrote: > > > While I was setting up to test with linux 6.14-rc1 (under Xen), I ran > > > into a consistent NULL ptr dereference within __blk_rq_map_sg when > > > booting the system. > > > > > > Using git bisect I was able to narrow down the "bad" commit to: > > > > > > block: add a dma mapping iterator (b7175e24d6acf79d9f3af9ce9d3d50de1fa748ec) > > > > > > Building a kernel with the parent commit > > > (2caca8fc7aad9ea9a6ea3ed26ed146b1e5f06fab) using the same .config does > > > not fail. > > > > > > Following is the console log showing the error as well as the Xen > > > (libvirt) configuration for the guest that I'm using. > > > > > > Please let me know if there is any additional information that I can provide. > > > > Can you test the following patch? > > > > Please try the revised one: > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 15cd231d560c..a66d087a6b55 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -493,7 +493,7 @@ static bool blk_map_iter_next(struct request *req, > return true; > } > > - if (!iter->iter.bi_size) > + if (!iter->bio || !iter->iter.bi_size) > return false; > > bv = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter); > @@ -514,6 +514,8 @@ static bool blk_map_iter_next(struct request *req, > if (!iter->bio->bi_next) > break; > iter->bio = iter->bio->bi_next; > + if (!iter->bio) > + break; > iter->iter = iter->bio->bi_iter; > } > > > > > Thanks, > Ming > Still getting a BUG at the same location. I was able to capture the BUG using a xen gdbsx / gdb session (the offending instruction is a mov 0x28(%rdx),%r13d and the bug is that %rdx is zero. -- break *__blk_rq_map_sg+0x5e if $rdx == 0) It appears in __blk_rq_map_sg that the rq->bio is NULL at the start of the routine. Breakpoint 1, __blk_rq_map_sg (q=<optimized out>, rq=rq@entry=0xffff888102231300, sglist=0xffff88810222f600, last_sg=last_sg@entry=0xffffc90000137c08) at block/blk-merge.c:568 (gdb) bt #0 __blk_rq_map_sg (q=<optimized out>, rq=rq@entry=0xffff888102231300, sglist=0xffff88810222f600, last_sg=last_sg@entry=0xffffc90000137c08) at block/blk-merge.c:568 #1 0xffffffff81db3a27 in blk_rq_map_sg (sglist=<optimized out>, rq=0xffff888102231300, q=<optimized out>) at ./include/linux/blk-mq.h:1165 #2 blkif_queue_rw_req (rinfo=0xffff88810088c000, req=0xffff888102231300) at drivers/block/xen-blkfront.c:754 #3 blkif_queue_request (rinfo=0xffff88810088c000, req=0xffff888102231300) at drivers/block/xen-blkfront.c:880 #4 blkif_queue_rq (hctx=0xffff888102205c00, qd=<optimized out>) at drivers/block/xen-blkfront.c:921 #5 0xffffffff818c1867 in blk_mq_dispatch_rq_list (hctx=hctx@entry=0xffff888102205c00, list=list@entry=0xffffc90000137d38, nr_budgets=nr_budgets@entry=0) at block/blk-mq.c:2120 #6 0xffffffff818c7ca0 in __blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff888102205c00) at block/blk-mq-sched.c:301 #7 0xffffffff818c820d in blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff888102205c00) at block/blk-mq-sched.c:331 #8 0xffffffff818bdbdc in blk_mq_run_hw_queue (hctx=0xffff888102205c00, async=async@entry=false) at block/blk-mq.c:2354 #9 0xffffffff818bec87 in blk_mq_run_hw_queues (q=q@entry=0xffff888100d49b00, async=async@entry=false) at block/blk-mq.c:2403 #10 0xffffffff818bfc52 in blk_mq_requeue_work (work=0xffff888100d49cf8) at block/blk-mq.c:1568 #11 0xffffffff812c5528 in process_one_work (worker=worker@entry=0xffff888100c253c0, work=0xffff888100d49cf8) at kernel/workqueue.c:3236 #12 0xffffffff812c668b in process_scheduled_works (worker=<optimized out>) at kernel/workqueue.c:3317 #13 worker_thread (__worker=0xffff888100c253c0) at kernel/workqueue.c:3398 #14 0xffffffff812cfaf1 in kthread (_create=<optimized out>) at kernel/kthread.c:464 #15 0xffffffff812502d4 in ret_from_fork (prev=<optimized out>, regs=0xffffc90000137f58, fn=0xffffffff812cfa00 <kthread>, fn_arg=0xffff888100c26340) at arch/x86/kernel/process.c:148 #16 0xffffffff812024aa in ret_from_fork_asm () at arch/x86/entry/entry_64.S:244 #17 0x0000000000000000 in ?? () (gdb) print *rq $1 = { q = 0xffff888100d49b00, mq_ctx = 0xffff888206c37b00, mq_hctx = 0xffff888102205c00, cmd_flags = 262146, rq_flags = 2, tag = 2, internal_tag = 59, timeout = 30000, __data_len = 0, __sector = 18446744073709551615, bio = 0x0 <fixed_percpu_data>, biotail = 0x0 <fixed_percpu_data>, { queuelist = { next = 0xffff888102231348, prev = 0xffff888102231348 }, rq_next = 0xffff888102231348 }, part = 0x0 <fixed_percpu_data>, start_time_ns = 62585793058, io_start_time_ns = 0, stats_sectors = 0, nr_phys_segments = 0, nr_integrity_segments = 0, state = MQ_RQ_IN_FLIGHT, ref = { counter = 1 }, deadline = 4294759798, { hash = { next = 0x0 <fixed_percpu_data>, pprev = 0x0 <fixed_percpu_data> }, ipi_list = { next = 0x0 <fixed_percpu_data> } }, { rb_node = { __rb_parent_color = 18446612686400852888, rb_right = 0x0 <fixed_percpu_data>, rb_left = 0x0 <fixed_percpu_data> }, special_vec = { bv_page = 0xffff888102231398, bv_len = 0, bv_offset = 0 } }, elv = { icq = 0x0 <fixed_percpu_data>, priv = {0x0 <fixed_percpu_data>, 0x0 <fixed_percpu_data>} }, flush = { seq = 0, saved_end_io = 0x0 <fixed_percpu_data> }, fifo_time = 0, end_io = 0xffffffff818b56b0 <flush_end_io>, end_io_data = 0x0 <fixed_percpu_data> } I suspect that the NULL dereference is in the initialization of the req_iterator itself: struct req_iterator iter = { .bio = rq->bio, .iter = rq->bio->bi_iter, <<< here }; Again let me know if there is any other information that I can provide. Cheyenne