On Tue, 2024-01-02 at 10:17 -0500, Steven Rostedt wrote: > On Thu, 14 Dec 2023 00:24:21 +0100 > Ilya Leoshkevich <iii@xxxxxxxxxxxxx> wrote: > > > Architectures use assembly code to initialize ftrace_regs and call > > ftrace_ops_list_func(). Therefore, from the KMSAN's point of view, > > ftrace_regs is poisoned on ftrace_ops_list_func entry(). This > > causes > > KMSAN warnings when running the ftrace testsuite. > > BTW, why is this only a problem for s390 and no other architectures? > > If it is only a s390 thing, then we should do this instead: > > in include/linux/ftrace.h: > > /* Add a comment here to why this is needed */ > #ifndef ftrace_list_func_unpoison > # define ftrace_list_func_unpoison(fregs) do { } while(0) > #endif > > In arch/s390/include/asm/ftrace.h: > > /* Add a comment to why s390 is special */ > # define ftrace_list_func_unpoison(fregs) > kmsan_unpoison_memory(fregs, sizeof(*fregs)) > > > > > Fix by trusting the architecture-specific assembly code and always > > unpoisoning ftrace_regs in ftrace_ops_list_func. > > > > Acked-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx> > > I'm taking my ack away for this change in favor of what I'm > suggesting now. > > > Reviewed-by: Alexander Potapenko <glider@xxxxxxxxxx> > > Signed-off-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx> > > --- > > kernel/trace/ftrace.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > > index 8de8bec5f366..dfb8b26966aa 100644 > > --- a/kernel/trace/ftrace.c > > +++ b/kernel/trace/ftrace.c > > @@ -7399,6 +7399,7 @@ __ftrace_ops_list_func(unsigned long ip, > > unsigned long parent_ip, > > void arch_ftrace_ops_list_func(unsigned long ip, unsigned long > > parent_ip, > > struct ftrace_ops *op, struct > > ftrace_regs *fregs) > > { > > + kmsan_unpoison_memory(fregs, sizeof(*fregs)); > > And here have: > > ftrace_list_func_unpoison(fregs); > > That way we only do it for archs that really need it, and do not > affect > archs that do not. > > > I want to know why this only affects s390, because if we are just > doing > this because "it works", it could be just covering up a symptom of > something else and not actually doing the "right thing". > > > -- Steve > > > > __ftrace_ops_list_func(ip, parent_ip, NULL, fregs); > > } > > #else > Ok, it has been a while, but I believe I have a good answer now. KMSAN shadow for memory above $rsp is essentially random. Here is an example (you'll need a GDB hack from [1] if you want to try this at home): (gdb) x/5i do_nanosleep 0xffffffff843607c0 <do_nanosleep>: call 0xffffffffc0201000 Thread 3 hit Breakpoint 1, 0xffffffffc0201000 in ?? () (gdb) x/64bx kmsan_get_metadata($rsp - 64, 0) 0xffffd1000087bd38: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffffd1000087bd40: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffffd1000087bd48: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffffd1000087bd50: 0x00 0x00 0x00 0x00 0xff 0xff 0xff 0xff 0xffffd1000087bd58: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xffffd1000087bd60: 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xffffd1000087bd68: 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xffffd1000087bd70: 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff So if assembly (in this case ftrace_regs_caller) allocates struct pt_regs on stack, it may or may not be poisoned depending on what was called before. So, by accident, on s390x it's poisoned and trips KMSAN, and on x86_64 it's not. Based on this observation, I'd say we need an unpoison call in all ftrace handlers (e.g., kprobe_ftrace_handler), and not just this one. But why is this the case? Kernel stacks are created by alloc_thread_stack_node() using __vmalloc_node_range(__GFP_ZERO), so they are fully unpoisoned. Then functions are called and return, their locals are poisoned and unpoisoned. Interestingly enough, on return, they are not poisoned back, even though commit 37ad4ee8364255c73026a3c343403b5977fa7e79 Author: Alexander Potapenko <glider@xxxxxxxxxx> Date: Thu Sep 15 17:04:13 2022 +0200 x86: kmsan: don't instrument stack walking functions says they do. So what if we introduce that [2]? # echo "p:nanosleep do_nanosleep %di" >/sys/kernel/tracing/kprobe_events # echo 1 >/sys/kernel/debug/tracing/events/kprobes/nanosleep/enable # sleep 1 ===================================================== BUG: KMSAN: uninit-value in kprobe_ftrace_handler+0x5b9/0x790 kprobe_ftrace_handler+0x5b9/0x790 0xffffffffc02010de do_nanosleep+0x5/0x670 hrtimer_nanosleep+0x169/0x3b0 common_nsleep+0xc7/0x100 __x64_sys_clock_nanosleep+0x4e2/0x650 do_syscall_64+0x6e/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e Local variable nd created at: do_filp_open+0x3b2/0x5e0 Quite similar to s390. Local variable nd is a random leftover from a different call stack, which the modified instrumentation poisoned on return from do_filp_open(). Alexander, what do you think about adding [2] upstream as an option that can be enabled from the command line? Also, what do you think about poisoning kernel stacks? Formally they are zeroed out, but I think valid code has no business reading these zeroes. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=31878 [2] https://github.com/iii-i/llvm-project/commits/msan-poison-allocas-before-returning-2024-06-12/