Re: [PATCH] ublk_drv: fix NULL pointer dereference in ublk_ctrl_start_recovery()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 5, 2024 at 5:48 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>
> On Wed, Jun 05, 2024 at 03:20:34PM +0800, Changhui Zhong wrote:
> > On Wed, Jun 5, 2024 at 9:41 AM Li Nan <linan666@xxxxxxxxxxxxxxx> wrote:
> > >
> > >
> > >
> > > 在 2024/6/4 9:32, Changhui Zhong 写道:
> > > > On Mon, Jun 3, 2024 at 10:20 AM Li Nan <linan666@xxxxxxxxxxxxxxx> wrote:
> > > >>
> > > >>
> > > >>
> > > >> 在 2024/6/3 8:39, Ming Lei 写道:
> > > >>
> > > >> [...]
> > > >>
> > > >>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > >>>> index 4e159948c912..99b621b2d40f 100644
> > > >>>> --- a/drivers/block/ublk_drv.c
> > > >>>> +++ b/drivers/block/ublk_drv.c
> > > >>>> @@ -2630,7 +2630,8 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
> > > >>>>    {
> > > >>>>       int i;
> > > >>>>
> > > >>>> -    WARN_ON_ONCE(!(ubq->ubq_daemon && ubq_daemon_is_dying(ubq)));
> > > >>>> +    if (WARN_ON_ONCE(!(ubq->ubq_daemon && ubq_daemon_is_dying(ubq))))
> > > >>>> +            return;
> > > >>>
> > > >>> Yeah, it is one bug. However, it could be addressed by adding the check in
> > > >>> ublk_ctrl_start_recovery() and return immediately in case of NULL ubq->ubq_daemon,
> > > >>> what do you think about this way?
> > > >>>
> > > >>
> > > >> Check ub->nr_queues_ready seems better. How about:
> > > >>
> > > >> @@ -2662,6 +2662,8 @@ static int ublk_ctrl_start_recovery(struct
> > > >> ublk_device *ub,
> > > >>           mutex_lock(&ub->mutex);
> > > >>           if (!ublk_can_use_recovery(ub))
> > > >>                   goto out_unlock;
> > > >> +       if (!ub->nr_queues_ready)
> > > >> +               goto out_unlock;
> > > >>           /*
> > > >>            * START_RECOVERY is only allowd after:
> > > >>            *
> > > >>
> > > >>>
> > > >>> Thanks,
> > > >>> Ming
> > > >>
> > > >> --
> > > >> Thanks,
> > > >> Nan
> > > >>
> > > >
> > > >
> > > > Hi,Nan
> > > >
> > > > After applying your new patch, I did not trigger "NULL pointer
> > > > dereference" and "Warning",
> > > > but hit task hung "Call Trace" info, please check
> > > >
> > > > [13617.812306] running generic/004
> > > > [13622.293674] blk_print_req_error: 91 callbacks suppressed
> > > > [13622.293681] I/O error, dev ublkb4, sector 233256 op 0x1:(WRITE)
> > > > flags 0x8800 phys_seg 1 prio class 0
> > > > [13622.308145] I/O error, dev ublkb4, sector 233256 op 0x0:(READ)
> > > > flags 0x0 phys_seg 2 prio class 0
> > > > [13622.316923] I/O error, dev ublkb4, sector 233264 op 0x1:(WRITE)
> > > > flags 0x8800 phys_seg 1 prio class 0
> > > > [13622.326048] I/O error, dev ublkb4, sector 233272 op 0x0:(READ)
> > > > flags 0x0 phys_seg 1 prio class 0
> > > > [13622.334828] I/O error, dev ublkb4, sector 233272 op 0x1:(WRITE)
> > > > flags 0x8800 phys_seg 1 prio class 0
> > > > [13622.343954] I/O error, dev ublkb4, sector 233312 op 0x0:(READ)
> > > > flags 0x0 phys_seg 1 prio class 0
> > > > [13622.352733] I/O error, dev ublkb4, sector 233008 op 0x0:(READ)
> > > > flags 0x0 phys_seg 1 prio class 0
> > > > [13622.361514] I/O error, dev ublkb4, sector 233112 op 0x0:(READ)
> > > > flags 0x0 phys_seg 1 prio class 0
> > > > [13622.370292] I/O error, dev ublkb4, sector 233192 op 0x1:(WRITE)
> > > > flags 0x8800 phys_seg 1 prio class 0
> > > > [13622.379419] I/O error, dev ublkb4, sector 233120 op 0x0:(READ)
> > > > flags 0x0 phys_seg 1 prio class 0
> > > > [13641.069695] INFO: task fio:174413 blocked for more than 122 seconds.
> > > > [13641.076061]       Not tainted 6.10.0-rc1+ #1
> > > > [13641.080338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > > disables this message.
> > > > [13641.088164] task:fio             state:D stack:0     pid:174413
> > > > tgid:174413 ppid:174386 flags:0x00004002
> > > > [13641.088168] Call Trace:
> > > > [13641.088170]  <TASK>
> > > > [13641.088171]  __schedule+0x221/0x670
> > > > [13641.088177]  schedule+0x23/0xa0
> > > > [13641.088179]  io_schedule+0x42/0x70
> > > > [13641.088181]  blk_mq_get_tag+0x118/0x2b0
> > > > [13641.088185]  ? gup_fast_pgd_range+0x280/0x370
> > > > [13641.088188]  ? __pfx_autoremove_wake_function+0x10/0x10
> > > > [13641.088192]  __blk_mq_alloc_requests+0x194/0x3a0
> > > > [13641.088194]  blk_mq_submit_bio+0x241/0x6c0
> > > > [13641.088196]  __submit_bio+0x8a/0x1f0
> > > > [13641.088199]  submit_bio_noacct_nocheck+0x168/0x250
> > > > [13641.088201]  ? submit_bio_noacct+0x45/0x560
> > > > [13641.088203]  __blkdev_direct_IO_async+0x167/0x1a0
> > > > [13641.088206]  blkdev_write_iter+0x1c8/0x270
> > > > [13641.088208]  aio_write+0x11c/0x240
> > > > [13641.088212]  ? __rq_qos_issue+0x21/0x40
> > > > [13641.088214]  ? blk_mq_start_request+0x34/0x1a0
> > > > [13641.088216]  ? io_submit_one+0x68/0x380
> > > > [13641.088218]  ? kmem_cache_alloc_noprof+0x4e/0x320
> > > > [13641.088221]  ? fget+0x7c/0xc0
> > > > [13641.088224]  ? io_submit_one+0xde/0x380
> > > > [13641.088226]  io_submit_one+0xde/0x380
> > > > [13641.088228]  __x64_sys_io_submit+0x80/0x160
> > > > [13641.088229]  do_syscall_64+0x79/0x150
> > > > [13641.088233]  ? syscall_exit_to_user_mode+0x6c/0x1f0
> > > > [13641.088237]  ? do_io_getevents+0x8b/0xe0
> > > > [13641.088238]  ? syscall_exit_work+0xf3/0x120
> > > > [13641.088241]  ? syscall_exit_to_user_mode+0x6c/0x1f0
> > > > [13641.088243]  ? do_syscall_64+0x85/0x150
> > > > [13641.088245]  ? do_syscall_64+0x85/0x150
> > > > [13641.088247]  ? blk_mq_flush_plug_list.part.0+0x108/0x160
> > > > [13641.088249]  ? rseq_get_rseq_cs+0x1d/0x220
> > > > [13641.088252]  ? rseq_ip_fixup+0x6d/0x1d0
> > > > [13641.088254]  ? blk_finish_plug+0x24/0x40
> > > > [13641.088256]  ? syscall_exit_to_user_mode+0x6c/0x1f0
> > > > [13641.088258]  ? do_syscall_64+0x85/0x150
> > > > [13641.088260]  ? syscall_exit_to_user_mode+0x6c/0x1f0
> > > > [13641.088262]  ? do_syscall_64+0x85/0x150
> > > > [13641.088264]  ? syscall_exit_to_user_mode+0x6c/0x1f0
> > > > [13641.088266]  ? do_syscall_64+0x85/0x150
> > > > [13641.088268]  ? do_syscall_64+0x85/0x150
> > > > [13641.088270]  ? do_syscall_64+0x85/0x150
> > > > [13641.088272]  ? clear_bhb_loop+0x45/0xa0
> > > > [13641.088275]  ? clear_bhb_loop+0x45/0xa0
> > > > [13641.088277]  ? clear_bhb_loop+0x45/0xa0
> > > > [13641.088279]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > [13641.088281] RIP: 0033:0x7ff92150713d
> > > > [13641.088283] RSP: 002b:00007ffca1ef81f8 EFLAGS: 00000246 ORIG_RAX:
> > > > 00000000000000d1
> > > > [13641.088285] RAX: ffffffffffffffda RBX: 00007ff9217e2f70 RCX: 00007ff92150713d
> > > > [13641.088286] RDX: 000055863b694fe0 RSI: 0000000000000010 RDI: 00007ff92164d000
> > > > [13641.088287] RBP: 00007ff92164d000 R08: 00007ff91936d000 R09: 0000000000000180
> > > > [13641.088288] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000010
> > > > [13641.088289] R13: 0000000000000000 R14: 000055863b694fe0 R15: 000055863b6970c0
> > > > [13641.088291]  </TASK>
> > > >
> > > > Thanks,
> > > > Changhui
> > > >
> > >
> > > After applying the previous patch, will the test environment continue to
> > > execute test cases after WARN?
> >
> > a few days ago,test with the previous patch, the test environment
> > continued to execute test cases after WARN,
> > and I terminated the test when I observed a WARN,so I did not observe
> > the subsequent situation.
> >
> > > I am not sure whether this issue has always
> > > existed but was not tested becasue of WARN, or whether the new patch
> > > introduced it.
> >
> > today, I re-test previous patch, and let it run for a long time,I
> > observed WARN and task hung,
> > looks this issue already existed and not introduced by new patch.
>
> Hi Changhui,
>
> The hang is actually expected because recovery fails.
>
> Please pull the latest ublksrv and check if the issue can still be
> reproduced:
>
> https://github.com/ublk-org/ublksrv
>
> BTW, one ublksrv segfault and two test cleanup issues are fixed.
>
> Thanks,
> Ming
>

Hi,Ming and Nan

after applying the new patch and pulling the latest ublksrv,
I ran the test for 4 hours and did not observe any task hang.
the test results looks good!

Thanks,
Changhui






[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux