Re: io_uring: worker thread NULL dereference during openat op

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 15, 2024 at 7:26 PM Dan Clash <daclash@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Below is a test program that causes multiple io_uring worker threads to
> hit a NULL dereference while executing openat ops.
>
> The test program hangs forever in a D state.  The test program can be
> run again after the NULL dereferences.  However, there are long delays
> at reboot time because the io_uring_cancel() during do_exit() attempts
> to wake the worker threads.
>
> The test program is single threaded but it queues multiple openat and
> close ops with IOSQE_ASYNC set before waiting for completions.
>
> I collected trace with /sys/kernel/tracing/events/io_uring/enable
> enabled if that is helpful.
>
> The test program reproduces similar problems in the following releases.
>
> mainline v6.9-rc3
> stable 6.8.5
> Ubuntu 6.5.0-1018-azure
>
> The test program does not reproduce the problem in Ubuntu
> 5.15.0-1052-azure, which does not have the io_uring audit changes.
>
> The following is the first io_uring worker thread backtrace in the repro
> against v6.9-rc3.
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP PTI
> CPU: 0 PID: 4628 Comm: iou-wrk-4605 Not tainted 6.9.0-rc3 #2
> Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
> BIOS Hyper-V UEFI Release v4.1 11/28/2023
> RIP: 0010:strlen (lib/string.c:402)
> Call Trace:
>   <TASK>
> ? show_regs (arch/x86/kernel/dumpstack.c:479)
> ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
> ? page_fault_oops (arch/x86/mm/fault.c:713)
> ? do_user_addr_fault (arch/x86/mm/fault.c:1261)
> ? exc_page_fault (./arch/x86/include/asm/irqflags.h:37
> ./arch/x86/include/asm/irqflags.h:72 arch/x86/mm/fault.c:1513
> arch/x86/mm/fault.c:1563)
> ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
> ? __pfx_strlen (lib/string.c:402)
> ? parent_len (kernel/auditfilter.c:1284).
> __audit_inode (kernel/auditsc.c:2381 (discriminator 4))

Thanks for the well documented bug report!

That's interesting, it looks like audit_inode() is potentially being
passed a filename struct with a NULL name field (filename::name ==
NULL).  Given the IOSQE_ASYNC and what looks like io_uring calling
getname() from within the __io_openat_prep() function, I suspect the
issue is that we aren't associating the filename information we
collect in getname() with the proper audit_context().  In other words,
we do the getname() in one context, and then the actual open operation
in another, and the audit filename info is lost in the switch.

I think this is related to another issue that Jens and I have been
discussing relating to connect() and sockaddrs.  We had been hoping
that the issue we were seeing with sockaddrs was just a special case
with connect, but it doesn't look like that is the case.

I'm going to be a bit busy this week with conferences, but given the
previous discussions with Jens as well as this new issue, I suspect
that we are going to need to do some work to support creation of a
thin, or lazily setup, audit_context that we can initialize in the
io_uring prep routines for use by getname(), move_addr_to_kernel(),
etc., store in the io_kiocb struct, and then fully setup in
audit_uring_entry().

> ? link_path_walk.part.0.constprop.0 (fs/namei.c:2324)
> path_openat (fs/namei.c:3550 fs/namei.c:3796)
> do_filp_open (fs/namei.c:3826)
> ? alloc_fd (./arch/x86/include/asm/paravirt.h:589 (discriminator 10)
> ./arch/x86/include/asm/qspinlock.h:57 (discriminator 10)
> ./include/linux/spinlock.h:204 (discriminator 10)
> ./include/linux/spinlock_api_smp.h:142 (discriminator 10)
> ./include/linux/spinlock.h:391 (discriminator 10) fs/file.c:553
> (discriminator 10))
> io_openat2 (io_uring/openclose.c:140)
> io_openat (io_uring/openclose.c:178)
> io_issue_sqe (io_uring/io_uring.c:1897)
> io_wq_submit_work (io_uring/io_uring.c:2006)
> io_worker_handle_work (io_uring/io-wq.c:540 io_uring/io-wq.c:597)
> io_wq_worker (io_uring/io-wq.c:258 io_uring/io-wq.c:648)
> ? __pfx_io_wq_worker (io_uring/io-wq.c:627)
> ? raw_spin_rq_unlock (./arch/x86/include/asm/paravirt.h:589
> ./arch/x86/include/asm/qspinlock.h:57 ./include/linux/spinlock.h:204
> ./include/linux/spinlock_api_smp.h:142 kernel/sched/core.c:603)
> ? finish_task_switch.isra.0 (./arch/x86/include/asm/irqflags.h:42
> ./arch/x86/include/asm/irqflags.h:77 kernel/sched/sched.h:1397
> kernel/sched/core.c:5163 kernel/sched/core.c:5281)
> ? __pfx_io_wq_worker (io_uring/io-wq.c:627)
> ret_from_fork (arch/x86/kernel/process.c:156)
> ? __pfx_io_wq_worker (io_uring/io-wq.c:627)
> ret_from_fork_asm (arch/x86/entry/entry_64.S:256)

-- 
paul-moore.com





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux