> On Feb 15, 2019, at 2:15 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > On Fri, 15 Feb 2019 09:55:32 -0800 > Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > >>> On Fri, Feb 15, 2019 at 9:49 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: >>> >>> From: Changbin Du <changbin.du@xxxxxxxxx> >>> >>> The userspace can ask kprobe to intercept strings at any memory address, >>> including invalid kernel address. >> >> Again, this is not about a "kernel address" at all. >> >> It's neither a kernel address _nor_ a user address. It's an invalid >> address entirely, and there is nothing that makes it "kernel". >> >> Please don't talk about "invalid kernel addresses" when it is no such thing. >> > > Ah, I see what you are saying. The example of the bug that the patch > fixed used a non-canonical address, which is "garbage", and not kernel > or user space. Point taken. > > But the issue is deeper than that. This code never bugged until the > following commit was added: > > 9da3f2b7405 "x86/fault: BUG() when uaccess helpers fault on kernel addresses" > > Before that commit, this worked fine. > > In fact, we can trigger the same bug (with a slightly different > message), if we use a kernel space address. > > To test this, I added the following variable somewhere in the code: > > void *sdr_ptr = 0xffffffff70112200; > > And then did the following: > > # echo 'p:fault do_sys_open arg=+0(@sdr_ptr):string' > /debug/tracing/kprobe_events > > Which will read the sdr_ptr variable just like the example did with +0(+0(%si)):string. > But this time I control the what address it is triggered on. > > And it crashed with: > > [ 1447.876799] BUG: pagefault on kernel address 0xffffffff70112200 in non-whitelisted uaccess > > The above message in the bug in the patch was: > "BUG: GPF in non-whitelisted uaccess (non-canonical address?)" > > [ 1447.885053] BUG: unable to handle kernel paging request at ffffffff70112200 > [ 1447.891997] #PF error: [normal kernel read fault] > [ 1447.896695] PGD 8a212067 P4D 8a212067 PUD 0 > [ 1447.900679] BUG: pagefault on kernel address 0xffffffff70112200 in non-whitelisted uaccess > [ 1447.900967] Oops: 0000 [#1] PREEMPT SMP PTI > [ 1447.913394] CPU: 7 PID: 1688 Comm: cat Not tainted 5.0.0-rc5-test+ #28 > [ 1447.919905] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 > [ 1447.928843] RIP: 0010:process_fetch_insn+0x1a0/0x470 > [ 1447.933804] Code: ff f0 80 48 03 80 83 80 18 1a 00 00 01 31 c9 eb 10 48 83 c2 01 85 c0 75 1f 81 f9 ff 0f 00 00 7f 17 0f 1f 00 0f ae e8 44 89 e0 <40> 8a 32 0f 1f 00 83 c1 01 40 84 f6 75 d9 65 48 8b 14 25 00 5c 01 > [ 1447.952520] RSP: 0018:ffffb77b80673d08 EFLAGS: 00010246 > [ 1447.957736] RAX: 0000000000000000 RBX: ffff91bfb7ab2c30 RCX: 0000000000000000 > [ 1447.964857] RDX: ffffffff70112200 RSI: ffffffff97267a80 RDI: 00007ffffffff000 > [ 1447.971981] RBP: 0000000000000000 R08: ffffffff70112200 R09: ffff91c014518000 > [ 1447.979105] R10: ffff91c01451801c R11: ffff91c0185e5800 R12: 0000000000000000 > [ 1447.986226] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 1447.993342] FS: 0000000000000000(0000) GS:ffff91c01a9c0000(0000) knlGS:0000000000000000 > [ 1448.001417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1448.007157] CR2: ffffffff70112200 CR3: 00000000b3f32004 CR4: 00000000001606e0 > [ 1448.014280] Call Trace: > [ 1448.016737] kprobe_trace_func+0x278/0x360 > [ 1448.020834] ? preempt_count_sub+0x98/0xe0 > [ 1448.024931] ? do_sys_open+0x5/0x220 > [ 1448.028503] kprobe_dispatcher+0x36/0x50 > [ 1448.032426] ? do_sys_open+0x1/0x220 > [ 1448.036002] kprobe_ftrace_handler+0xb5/0x120 > [ 1448.040356] ftrace_ops_assist_func+0x87/0x110 > [ 1448.044797] 0xffffffffc06a30bf > [ 1448.047939] ? __ia32_sys_open+0x20/0x20 > [ 1448.051860] ? do_syscall_64+0x1c/0x160 > [ 1448.055694] ? do_sys_open+0x1/0x220 > [ 1448.059268] do_sys_open+0x5/0x220 > [ 1448.062672] do_syscall_64+0x60/0x160 > [ 1448.066335] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Which triggered on the address: 0xffffffff70112200 > > Which is perfectly canonical. > > The reason I use "kernel address" is because this became an issue after > "x86/fault: BUG() when uaccess helpers fault on kernel addresses" was > added and that explicitly states "kernel address". > > That patch added: > > + /* > + * This is a faulting memory access in kernel space, on a kernel > + * address, in a usercopy function. This can e.g. be caused by improper > + * use of helpers like __put_user and by improper attempts to access > + * userspace addresses in KERNEL_DS regions. > + * The one (semi-)legitimate exception are probe_kernel_{read,write}(), > + * which can be invoked from places like kgdb, /dev/mem (for reading) > + * and privileged BPF code (for reading). > + * The probe_kernel_*() functions set the kernel_uaccess_faults_ok flag > + * to tell us that faulting on kernel addresses, and even noncanonical > + * addresses, in a userspace accessor does not necessarily imply a > + * kernel bug, root might just be doing weird stuff. > + */ > + if (current->kernel_uaccess_faults_ok) > + return false; > + > + /* This is bad. Refuse the fixup so that we go into die(). */ > + if (trapnr == X86_TRAP_PF) { > + pr_emerg("BUG: pagefault on kernel address 0x%lx in non-whitelisted uaccess\n", > + fault_addr); > + } else { > + pr_emerg("BUG: GPF in non-whitelisted uaccess (non-canonical address?)\n"); > + } > > Where this path triggered by the kprobe using copy_from_user(). So > yeah, it can happen if it triggered on a non-canonical address (which is > just garbage), but it can also trigger if it's a kernel address that > isn't mapped either. > > So the comment in the code is not 100% accurate, because it isn't just > a kernel address, but also a non-canonical one. Something tells me that > the change log of the commit that checks this isn't written well > either. What exactly can't be done now with uaccess code? I'm guessing > that it should only be allowed to fault on user space addresses? So > should I change this commit log to: > > kprobe: Do not use uaccess function to access non-user address that can fault > > And change all the "kernel address" mentions to "non-user address" as > non-user covers both kernel address and non-canonical? > > > -- Steve > > This patch allows me to modify the sdr_ptr as well from the tracing directory: > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 2cf3c747a357..292fe2ef0a45 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -7928,6 +7928,45 @@ static const struct file_operations buffer_percent_fops = { > .llseek = default_llseek, > }; > > +void *sdr_ptr = 0xffffffff70112200; > + > +static ssize_t > +sdr_ptr_read(struct file *filp, char __user *ubuf, > + size_t cnt, loff_t *ppos) > +{ > + char buf[64]; > + int r; > + > + r = sprintf(buf, "%px\n", sdr_ptr); > + > + return simple_read_from_buffer(ubuf, cnt, ppos, buf, r); > +} > + > +static ssize_t > +sdr_ptr_write(struct file *filp, const char __user *ubuf, > + size_t cnt, loff_t *ppos) > +{ > + unsigned long val; > + int ret; > + > + ret = kstrtoul_from_user(ubuf, cnt, 0, &val); > + if (ret) > + return ret; > + > + sdr_ptr = val; > + > + (*ppos)++; > + > + return cnt; > +} > + > +static const struct file_operations sdr_ptr_fops = { > + .open = tracing_open_generic, > + .read = sdr_ptr_read, > + .write = sdr_ptr_write, > + .llseek = default_llseek, > +}; > + > struct dentry *trace_instance_dir; > > static void > @@ -8188,6 +8227,9 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) > struct trace_event_file *file; > int cpu; > > + trace_create_file("sdr_ptr", 0644, d_tracer, > + NULL, &sdr_ptr_fops); > + > trace_create_file("available_tracers", 0444, d_tracer, > tr, &show_traces_fops); > I’m missing most of the context here, but even probe_kernel_...() is unwise for a totally untrustworthy address. It could be MMIO, for example. If needed, we could come up with a safe-ish helper for tracing. For direct-map addresses, probe_kernel_...() is probably okay. Same for the current stack. Otherwise we could walk the page tables and check that the address is cacheable, I suppose, although this is slightly dubious if we don’t also check MTRRs. We could also check that the PA is in main memory, I suppose, although this may have unfortunate interactions with the MCE code.