Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> writes: > BPF programs that are loaded by privileged users (with CAP_BPF and > CAP_PERFMON) are allowed to be non-confidential. This means that they > can read arbitrary kernel memory, and also communicate kernel pointers > through maps and other channels of communication from BPF programs to > applications running in userspace. > > This is a critical use case for applications that implement kernel > tracing, and observability functionality using BPF programs, and > provides users with much needed visibility and context into a running > kernel. > > There are two supported methods of such kernel memory "probing", using > bpf_probe_read_kernel (and related) helpers, or using direct load > instructions of untrusted kernel memory (e.g. arguments to tracepoint > programs, through bpf_core_cast casting, etc.). > > For direct load instructions on untrusted kernel pointers, the verifier > converts these to PROBE_MEM loads, and the JIT handles these loads by > adding a bounds check and handling exceptions on page faults (when > reading invalid kernel memory). > > So far, the implementation of PROBE_MEM (particularly on x86) has relied > on bounds check because it needs to protect the BPF program from reading > user addresses. Loads for such addresses will lead to a kernel panic > due to panic in do_user_addr_fault, because the page fault on accessing > userspace address in kernel mode will be unhandled. > > This patch instead proposes to do exception handling in > do_user_addr_fault when user addresses are accessed by a BPF program, > and when SMAP is enabled on x86. This would obviate the need for the BPF > JIT to emit bounds checking for PROBE_MEM load instructions, and any > invalid memory accesses (either for user addresses or unmapped kernel > addresses) will be handled by the page fault handler. > > This set does not grant programs any additional privileges than those > they already had. Instead, it optimizes the common case of doing loads > on valid kernel memory, while shifting the cost to cases where invalid > kernel memory is accessed without sanitization by a program. > > Changelog: > ---------- > v1 -> v2 > v1: https://lore.kernel.org/bpf/20240515233932.3733815-1-memxor@xxxxxxxxx > > * Rebase on bpf-next > > Kumar Kartikeya Dwivedi (2): > x86: Perform BPF exception fixup in do_user_addr_fault > bpf, x86: Skip bounds checking for PROBE_MEM with SMAP > > arch/x86/mm/fault.c | 11 +++++++++++ > arch/x86/net/bpf_jit_comp.c | 11 +++++++++-- > 2 files changed, 20 insertions(+), 2 deletions(-) > > > base-commit: f6afdaf72af7583d251bd569ded8d7d1eeb849c2 > -- > 2.43.0 We can also do something like this for ARM64 when PAN(Privileged Access Never) is available. And if we are doing it then for RISC-V we can remove this bounds checking completely because RISC-V always traps when kernel accesses userspace addresses outside of uaccess routines. But I am curious to know what other developers think about this. Acked-by: Puranjay Mohan <puranjay@xxxxxxxxxx> Thanks, Puranjay
Attachment:
signature.asc
Description: PGP signature