Re: KASAN: use-after-free Read in ___bpf_prog_run

Yonghong Song <yhs@xxxxxxxx> · Mon, 16 Jan 2023 23:50:15 -0800

On 1/16/23 4:45 AM, Hao Sun wrote:

On 12 Jan 2023, at 2:59 PM, Yonghong Song <yhs@xxxxxxxx> wrote:

On 1/9/23 5:21 AM, Hao Sun wrote:
Yonghong Song <yhs@xxxxxxxx> 于2022年12月18日周日 00:57写道：

On 12/16/22 10:54 PM, Hao Sun wrote:

On 17 Dec 2022, at 1:07 PM, Yonghong Song <yhs@xxxxxxxx> wrote:

On 12/14/22 11:49 PM, Hao Sun wrote:
Hi,
The following KASAN report can be triggered by loading and test
running this simple BPF prog with a random data/ctx:
0: r0 = bpf_get_current_task_btf      ;
R0_w=trusted_ptr_task_struct(off=0,imm=0)
1: r0 = *(u32 *)(r0 +8192)       ;
R0_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
2: exit
I've simplified the C reproducer but didn't find the root cause.
JIT was disabled, and the interpreter triggered UAF when executing
the load insn. A slab-out-of-bound read can also be triggered:
https://pastebin.com/raw/g9zXr8jU
This can be reproduced on:
HEAD commit: b148c8b9b926 selftests/bpf: Add few corner cases to test
padding handling of btf_dump
git tree: bpf-next
console log: https://pastebin.com/raw/1EUi9tJe
kernel config: https://pastebin.com/raw/rgY3AJDZ
C reproducer: https://pastebin.com/raw/cfVGuCBm

I I tried with your above kernel config and C reproducer and cannot reproduce the kasan issue you reported.

[root@arch-fb-vm1 bpf-next]# ./a.out
func#0 @0
0: R1=ctx(off=0,imm=0) R10=fp0
0: (85) call bpf_get_current_task_btf#158     ; R0_w=trusted_ptr_task_struct(off=0,imm=0)
1: (61) r0 = *(u32 *)(r0 +8192)       ; R0_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
2: (95) exit
processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

prog fd: 3
[root@arch-fb-vm1 bpf-next]#

Your config indeed has kasan on.

Hi,

I can still reproduce this on a latest bpf-next build: 0e43662e61f25
(“tools/resolve_btfids: Use pkg-config to locate libelf”).
The simplified C reproducer sometime need to be run twice to trigger
the UAF. Also note that interpreter is required. Here is the original
C reproducer that loads and runs the BPF prog continuously for your
convenience:
https://pastebin.com/raw/WSJuNnVU

I still cannot reproduce with more than 10 runs. The config has jit off
so it already uses interpreter. It has kasan on as well.
# CONFIG_BPF_JIT is not set

Since you can reproduce it, I guess it would be great if you can
continue to debug this.

The load insn ‘r0 = *(u32*) (current + 8192)’ is OOB, because sizeof(task_struct)
is 7240 as shown in KASAN report. The issue is that struct task_struct is special,
its runtime size is actually smaller than it static type size. In X86:
task_struct->thread_struct->fpu->fpstate->union fpregs_state is
/*
* ...
* The size of the structure is determined by the largest
* member - which is the xsave area. The padding is there
* to ensure that statically-allocated task_structs (just
* the init_task today) have enough space.
*/
union fpregs_state {
struct fregs_state fsave;
struct fxregs_state fxsave;
struct swregs_state soft;
struct xregs_state xsave;
u8 __padding[PAGE_SIZE];
};
In btf_struct_access(), the resolved size for task_struct is 10496, much bigger
than its runtime size, so the prog in reproducer passed the verifier and leads
to the oob. This can happen to all similar types, whose runtime size is smaller
than its static size.
Not sure how many similar cases are there, maybe special check to task_struct
is enough. Any hint on how this should be addressed?

This should a corner case, I am not aware of other allocations like this.

For a normal program, if the access chain looks
like
task_struct->thread_struct->fpu->fpstate->fpregs_state->{fsave,fxsave, soft, xsave},
we should not hit this issue. So I think we don't need to address this
issue in kernel. The test itself should filter this out.

Maybe I didn’t make my point clear. The issue here is that the runtime size
of task_struct is `arch_task_struct_size`, which equals to the following,
see fpu__init_task_struct_size():

sizeof(task_struct) - sizeof(fpregs_state) + fpu_kernel_cfg.default_size

However, the verifier validates task_struct access based on the ty info in
btf_vmlinux, which equals to sizeof(task_struct), much bigger than `arch_
task_struct_size` due to the `__padding[PAGE_SIZE]` in fpregs_state, this
leads to OOB access.

We should not allow progs that access the task_struct beyond `arch_task_
struct_size` to be loaded. So maybe we should add a fixup in `btf_parse_
vmlinux()` to assign the correct size to each type of this chain:
task_struct->thread_struct->fpu->fpstate->fpregs_state;
Or maybe we should fix this during generating BTF of vmlinux?

Thanks for detailed explanation. I do understand the problem with a 
slight caveat. I mentioned if the user doing a proper access in C
code like

task_struct->thread_struct->fpu->fpstate->fpregs_state->{fsave,fxsave, 
soft, xsave}
we should not hit the problem if the bpf program is written to
retrieve *correct* and *expected* data.

But if the bpf program tries to access the data beyond the *correct*
boundary (i.e. beyond fpu__init_task_struct_size()), uninit mem access
might happen.

Since fpu__init_task_struct_size() might be different for different
arch features, we cannot change vmlinux BTF. We could add a check in
btf_struct_access such that if
  . btf_id references to struct task_struct
  . walked offset > arch_task_struct_size
reject the program with a verification failure.
But I kind of don't like this special case as it only happens with
*incorrect* program. Note that we won't have fault as the memory
read is protected by bpf exception handling.