Re: BUG: NULL pointer dereferenced within __blk_rq_map_sg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 11, 2025 at 8:29 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>
> On Tue, Feb 11, 2025 at 08:13:16PM +0800, Ming Lei wrote:
> > On Fri, Feb 07, 2025 at 07:09:39PM -0700, Cheyenne Wills wrote:
> > > While I was setting up to test with linux 6.14-rc1 (under Xen), I ran
> > > into a consistent NULL ptr dereference within __blk_rq_map_sg when
> > > booting the system.
> > >
> > > Using git bisect I was able to narrow down the "bad" commit to:
> > >
> > > block: add a dma mapping iterator (b7175e24d6acf79d9f3af9ce9d3d50de1fa748ec)
> > >
> > > Building a kernel with the parent commit
> > > (2caca8fc7aad9ea9a6ea3ed26ed146b1e5f06fab) using the same .config does
> > > not fail.
> > >
> > > Following is the console log showing the error as well as the Xen
> > > (libvirt) configuration for the guest that I'm using.
> > >
> > > Please let me know if there is any additional information that I can provide.
> >
> > Can you test the following patch?
> >
>
> Please try the revised one:
>
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 15cd231d560c..a66d087a6b55 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -493,7 +493,7 @@ static bool blk_map_iter_next(struct request *req,
>                 return true;
>         }
>
> -       if (!iter->iter.bi_size)
> +       if (!iter->bio || !iter->iter.bi_size)
>                 return false;
>
>         bv = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter);
> @@ -514,6 +514,8 @@ static bool blk_map_iter_next(struct request *req,
>                         if (!iter->bio->bi_next)
>                                 break;
>                         iter->bio = iter->bio->bi_next;
> +                       if (!iter->bio)
> +                               break;
>                         iter->iter = iter->bio->bi_iter;
>                 }
>
>
>
>
> Thanks,
> Ming
>

Still getting a BUG at the same location.

I was able to capture the BUG using a xen gdbsx / gdb session (the
offending instruction is a mov  0x28(%rdx),%r13d and the bug is that
%rdx is zero. -- break *__blk_rq_map_sg+0x5e if $rdx == 0)

It appears in __blk_rq_map_sg that the rq->bio is NULL at the start of
the routine.

Breakpoint 1, __blk_rq_map_sg (q=<optimized out>,
rq=rq@entry=0xffff888102231300, sglist=0xffff88810222f600,
last_sg=last_sg@entry=0xffffc90000137c08) at block/blk-merge.c:568
(gdb) bt
#0  __blk_rq_map_sg (q=<optimized out>,
rq=rq@entry=0xffff888102231300, sglist=0xffff88810222f600,
last_sg=last_sg@entry=0xffffc90000137c08) at block/blk-merge.c:568
#1  0xffffffff81db3a27 in blk_rq_map_sg (sglist=<optimized out>,
rq=0xffff888102231300, q=<optimized out>) at
./include/linux/blk-mq.h:1165
#2  blkif_queue_rw_req (rinfo=0xffff88810088c000,
req=0xffff888102231300) at drivers/block/xen-blkfront.c:754
#3  blkif_queue_request (rinfo=0xffff88810088c000,
req=0xffff888102231300) at drivers/block/xen-blkfront.c:880
#4  blkif_queue_rq (hctx=0xffff888102205c00, qd=<optimized out>) at
drivers/block/xen-blkfront.c:921
#5  0xffffffff818c1867 in blk_mq_dispatch_rq_list
(hctx=hctx@entry=0xffff888102205c00,
list=list@entry=0xffffc90000137d38, nr_budgets=nr_budgets@entry=0) at
block/blk-mq.c:2120
#6  0xffffffff818c7ca0 in __blk_mq_sched_dispatch_requests
(hctx=hctx@entry=0xffff888102205c00) at block/blk-mq-sched.c:301
#7  0xffffffff818c820d in blk_mq_sched_dispatch_requests
(hctx=hctx@entry=0xffff888102205c00) at block/blk-mq-sched.c:331
#8  0xffffffff818bdbdc in blk_mq_run_hw_queue
(hctx=0xffff888102205c00, async=async@entry=false) at
block/blk-mq.c:2354
#9  0xffffffff818bec87 in blk_mq_run_hw_queues
(q=q@entry=0xffff888100d49b00, async=async@entry=false) at
block/blk-mq.c:2403
#10 0xffffffff818bfc52 in blk_mq_requeue_work
(work=0xffff888100d49cf8) at block/blk-mq.c:1568
#11 0xffffffff812c5528 in process_one_work
(worker=worker@entry=0xffff888100c253c0, work=0xffff888100d49cf8) at
kernel/workqueue.c:3236
#12 0xffffffff812c668b in process_scheduled_works (worker=<optimized
out>) at kernel/workqueue.c:3317
#13 worker_thread (__worker=0xffff888100c253c0) at kernel/workqueue.c:3398
#14 0xffffffff812cfaf1 in kthread (_create=<optimized out>) at
kernel/kthread.c:464
#15 0xffffffff812502d4 in ret_from_fork (prev=<optimized out>,
regs=0xffffc90000137f58, fn=0xffffffff812cfa00 <kthread>,
fn_arg=0xffff888100c26340) at arch/x86/kernel/process.c:148
#16 0xffffffff812024aa in ret_from_fork_asm () at arch/x86/entry/entry_64.S:244
#17 0x0000000000000000 in ?? ()
(gdb) print *rq
$1 = {
  q = 0xffff888100d49b00,
  mq_ctx = 0xffff888206c37b00,
  mq_hctx = 0xffff888102205c00,
  cmd_flags = 262146,
  rq_flags = 2,
  tag = 2,
  internal_tag = 59,
  timeout = 30000,
  __data_len = 0,
  __sector = 18446744073709551615,
  bio = 0x0 <fixed_percpu_data>,
  biotail = 0x0 <fixed_percpu_data>,
  {
    queuelist = {
      next = 0xffff888102231348,
      prev = 0xffff888102231348
    },
    rq_next = 0xffff888102231348
  },
  part = 0x0 <fixed_percpu_data>,
  start_time_ns = 62585793058,
  io_start_time_ns = 0,
  stats_sectors = 0,
  nr_phys_segments = 0,
  nr_integrity_segments = 0,
  state = MQ_RQ_IN_FLIGHT,
  ref = {
    counter = 1
  },
  deadline = 4294759798,
  {
    hash = {
      next = 0x0 <fixed_percpu_data>,
      pprev = 0x0 <fixed_percpu_data>
    },
    ipi_list = {
      next = 0x0 <fixed_percpu_data>
    }
  },
  {
    rb_node = {
      __rb_parent_color = 18446612686400852888,
      rb_right = 0x0 <fixed_percpu_data>,
      rb_left = 0x0 <fixed_percpu_data>
    },
    special_vec = {
      bv_page = 0xffff888102231398,
      bv_len = 0,
      bv_offset = 0
    }
  },
  elv = {
    icq = 0x0 <fixed_percpu_data>,
    priv = {0x0 <fixed_percpu_data>, 0x0 <fixed_percpu_data>}
  },
  flush = {
    seq = 0,
    saved_end_io = 0x0 <fixed_percpu_data>
  },
  fifo_time = 0,
  end_io = 0xffffffff818b56b0 <flush_end_io>,
  end_io_data = 0x0 <fixed_percpu_data>
}


I suspect that the NULL dereference is in the initialization of the
req_iterator itself:

struct req_iterator iter = {
.bio = rq->bio,
.iter = rq->bio->bi_iter,        <<< here
};

Again let me know if there is any other information that I can provide.

Cheyenne





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux