Re: [RFC PATCH v4 4/4] tracing: Add might_fault() check in __DO_TRACE() for syscall

Jordan Rife <jrife@xxxxxxxxxx> · Wed, 30 Oct 2024 10:34:40 -0700

> > [  687.334265][T16276] allocated by task 16281 on cpu 1 at 683.953385s (3.380878s ago):
> > [  687.335615][T16276]  tracepoint_add_func+0x28a/0xd90
> > [  687.336424][T16276]  tracepoint_probe_register_prio_may_exist+0xa2/0xf0
> > [  687.337416][T16276]  bpf_probe_register+0x186/0x200
> > [  687.338174][T16276]  bpf_raw_tp_link_attach+0x21f/0x540
> > [  687.339233][T16276]  __sys_bpf+0x393/0x4fa0
> > [  687.340042][T16276]  __x64_sys_bpf+0x78/0xc0
> > [  687.340801][T16276]  do_syscall_64+0xcb/0x250
> > [  687.341623][T16276]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> I think the stack trace points out that the patch [1] isn't really fixing it.
> UAF is on access to bpf_link in __traceiter_sys_enter

The stack trace points to the memory in question being allocated by
tracepoint_add_func where allocation and assignment to
__tracepoint_sys_enter->funcs happens. Mathieu's patch addresses
use-after-free on this structure by using call_rcu_tasks_trace inside
release_probes. In contrast, here is what the "allocated by" trace
looks like for UAF on access to bpf_link (copied from the original
KASAN crash report [4]).

> Allocated by task 5681:
>  kasan_save_stack mm/kasan/common.c:47 [inline]
>  kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
>  poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
>  __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
>  kasan_kmalloc include/linux/kasan.h:260 [inline]
>  __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4304
>  kmalloc_noprof include/linux/slab.h:901 [inline]
>  kzalloc_noprof include/linux/slab.h:1037 [inline]
>  bpf_raw_tp_link_attach+0x2a0/0x6e0 kernel/bpf/syscall.c:3829
>  bpf_raw_tracepoint_open+0x177/0x1f0 kernel/bpf/syscall.c:3876
>  __sys_bpf+0x3c0/0x810 kernel/bpf/syscall.c:5691
>  __do_sys_bpf kernel/bpf/syscall.c:5756 [inline]
>  __se_sys_bpf kernel/bpf/syscall.c:5754 [inline]
>  __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5754
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f

This clearly points to where memory for a bpf_link is allocated.

> link = kzalloc(sizeof(*link), GFP_USER);
> if (!link) {
>     err = -ENOMEM;
>     goto out_put_btp;
> }

To add some context, if I apply Mathieu's patch alone then I get no
meaningful test signal when running the reproducer, because the UAF
crash almost always occurs first on accesses to bpf_link or bpf_prog
showing a trace like the second one above. My intent in applying patch
[1] is to mask out these sources of UAF-related crashes on the BPF
side to just focus on what this series addresses. This series should
eventually be tested end-to-end with Andrii's fix for the BPF stuff
that he mentioned last week, but that would rely on this patch series,
tracepoint_is_faultable() in particular, so it's kind of chicken and
egg in terms of testing. In the meantime, [1] provides a bandaid to
allow some degree of test coverage on this patch.

> while your patch [1] and all attempts to "fix" were delaying bpf_prog.
> The issue is not reproducing anymore due to luck.

[1] chains call_rcu_tasks_trace and call_rcu to free both bpf_prog and
bpf_link after unregistering the trace point. This grace period should
be sufficient to prevent UAF on these structures from the syscall TP
handlers which are protected with rcu_read_lock_trace. I've run the
reproducer many times. Without /some/ fix on the BPF side it crashes
reliably within seconds here. Using call_rcu_tasks_trace or chaining
it with call_rcu eliminates UAF on the BPF stuff which eliminates a
couple of variables for local testing.

If you are not convinced, I'm happy to run through other test
scenarios or run the reproducer for much longer.

-Jordan