Re: [PATCH 09/10] kthread: Ensure struct kthread is present for all kthreads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nathan Chancellor <nathan@xxxxxxxxxx> writes:

> On Wed, Dec 22, 2021 at 12:30:57PM -0600, Eric W. Biederman wrote:
>> Nathan Chancellor <nathan@xxxxxxxxxx> writes:
>> 
>> > Hi Eric,
>> >
>> > On Wed, Dec 08, 2021 at 02:25:31PM -0600, Eric W. Biederman wrote:
>> >> Today the rules are a bit iffy and arbitrary about which kernel
>> >> threads have struct kthread present.  Both idle threads and thread
>> >> started with create_kthread want struct kthread present so that is
>> >> effectively all kernel threads.  Make the rule that if PF_KTHREAD
>> >> and the task is running then struct kthread is present.
>> >> 
>> >> This will allow the kernel thread code to using tsk->exit_code
>> >> with different semantics from ordinary processes.
>> >> 
>> >> To make ensure that struct kthread is present for all
>> >> kernel threads move it's allocation into copy_process.
>> >> 
>> >> Add a deallocation of struct kthread in exec for processes
>> >> that were kernel threads.
>> >> 
>> >> Move the allocation of struct kthread for the initial thread
>> >> earlier so that it is not repeated for each additional idle
>> >> thread.
>> >> 
>> >> Move the initialization of struct kthread into set_kthread_struct
>> >> so that the structure is always and reliably initailized.
>> >> 
>> >> Clear set_child_tid in free_kthread_struct to ensure the kthread
>> >> struct is reliably freed during exec.  The function
>> >> free_kthread_struct does not need to clear vfork_done during exec as
>> >> exec_mm_release called from exec_mmap has already cleared vfork_done.
>> >> 
>> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>> >
>> > This patch as commit 40966e316f86 ("kthread: Ensure struct kthread is
>> > present for all kthreads") in -next causes an ARCH=arm
>> > multi_v5_defconfig kernel to fail to boot in QEMU. I had to apply commit
>> > 6692c98c7df5 ("fork: Stop protecting back_fork_cleanup_cgroup_lock with
>> > CONFIG_NUMA") to get it to build and I applied commit dd621ee0cf8e
>> > ("kthread: Warn about failed allocations for the init kthread") to avoid
>> > the known runtime warning.
>> >
>> > $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- distclean multi_v5_defconfig all
>> >
>> > $ qemu-system-arm \
>> >     -initrd rootfs.cpio \
>> >     -append earlycon \
>> >     -machine palmetto-bmc \
>> >     -no-reboot \
>> >     -dtb arch/arm/boot/dts/aspeed-bmc-opp-palmetto.dtb \
>> >     -display none \
>> >     -kernel arch/arm/boot/zImage \
>> >     -m 512m \
>> >     -nodefaults \
>> >     -serial mon:stdio
>> > qemu-system-arm: warning: nic ftgmac100.0 has no peer
>> > qemu-system-arm: warning: nic ftgmac100.1 has no peer
>> > Booting Linux on physical CPU 0x0
>> > Linux version 5.16.0-rc1-00016-g40966e316f86-dirty (nathan@archlinux-ax161) (arm-linux-gnueabi-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 PREEMPT Wed Dec 22 18:08:53 UTC 2021
>> > CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00093177
>> > CPU: VIVT data cache, VIVT instruction cache
>> > OF: fdt: Machine model: Palmetto BMC
>> > earlycon: ns16550a0 at MMIO 0x1e784000 (options '')
>> > printk: bootconsole [ns16550a0] enabled
>> > Memory policy: Data cache writethrough
>> > cma: Reserved 16 MiB at 0x5b000000
>> > Zone ranges:
>> >   DMA      [mem 0x0000000040000000-0x000000005edfffff]
>> >   Normal   empty
>> >   HighMem  [mem 0x000000005ee00000-0x000000005fffffff]
>> > Movable zone start for each node
>> > Early memory node ranges
>> >   node   0: [mem 0x0000000040000000-0x000000005bffffff]
>> >   node   0: [mem 0x000000005c000000-0x000000005dffffff]
>> >   node   0: [mem 0x000000005e000000-0x000000005edfffff]
>> >   node   0: [mem 0x000000005ee00000-0x000000005fffffff]
>> > Initmem setup node 0 [mem 0x0000000040000000-0x000000005fffffff]
>> > Built 1 zonelists, mobility grouping on.  Total pages: 130084
>> > Kernel command line: earlycon
>> > Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
>> > Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
>> > mem auto-init: stack:off, heap alloc:off, heap free:off
>> > Memory: 433140K/524288K available (9628K kernel code, 2019K rwdata, 2368K rodata, 340K init, 661K bss, 74764K reserved, 16384K cma-reserved, 0K highmem)
>> > SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
>> > rcu: Preemptible hierarchical RCU implementation.
>> > rcu:    RCU event tracing is enabled.
>> >         Trampoline variant of Tasks RCU enabled.
>> > rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
>> > NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
>> > i2c controller registered, irq 16
>> > random: get_random_bytes called from start_kernel+0x408/0x624 with crng_init=0
>> > clocksource: FTTMR010-TIMER2: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635851949 ns
>> > sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 89478484971ns
>> > Switching to timer-based delay loop, resolution 41ns
>> > Console: colour dummy device 80x30
>> > printk: console [tty0] enabled
>> > printk: bootconsole [ns16550a0] disabled
>> >
>> > After that, it just hangs.
>> >
>> > The rootfs is available at https://github.com/ClangBuiltLinux/boot-utils
>> > in the images/arm folder.
>> >
>> > If there is any more information that I can provide or changes to test,
>> > please let me know.

I have managed to reproduce, fix and verify my fix, please
see below.


Subject: [PATCH] kthread: Never put_user the set_child_tid address

Kernel threads abuse set_child_tid.  Historically that has been fine
as set_child_tid was initialized after the kernel thread had been
forked.  Unfortunately storing struct kthread in set_child_tid after
the thread is running makes struct kthread being unusable for storing
result codes of the thread.

When set_child_tid is set to struct kthread during fork that results
in schedule_tail writing the thread id to the beggining of struct
kthread (if put_user does not realize it is a kernel address).

Solve this by skipping the put_user for all kthreads.

Reported-by: Nathan Chancellor <nathan@xxxxxxxxxx>
Link: https://lkml.kernel.org/r/YcNsG0Lp94V13whH@archlinux-ax161
Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee222b89c692..d8adbea77be1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4908,7 +4908,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
 	finish_task_switch(prev);
 	preempt_enable();
 
-	if (current->set_child_tid)
+	if (!(current->flags & PF_KTHREAD) && current->set_child_tid)
 		put_user(task_pid_vnr(current), current->set_child_tid);
 
 	calculate_sigpending();
-- 
2.29.2


Eric



[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux