Re: [PATCH bpf-next v2 0/2] Zero overhead PROBE_MEM

Puranjay Mohan <puranjay@xxxxxxxxxx> · Wed, 19 Jun 2024 11:36:20 +0000

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> writes:

> BPF programs that are loaded by privileged users (with CAP_BPF and
> CAP_PERFMON) are allowed to be non-confidential. This means that they
> can read arbitrary kernel memory, and also communicate kernel pointers
> through maps and other channels of communication from BPF programs to
> applications running in userspace.
>
> This is a critical use case for applications that implement kernel
> tracing, and observability functionality using BPF programs, and
> provides users with much needed visibility and context into a running
> kernel.
>
> There are two supported methods of such kernel memory "probing", using
> bpf_probe_read_kernel (and related) helpers, or using direct load
> instructions of untrusted kernel memory (e.g. arguments to tracepoint
> programs, through bpf_core_cast casting, etc.).
>
> For direct load instructions on untrusted kernel pointers, the verifier
> converts these to PROBE_MEM loads, and the JIT handles these loads by
> adding a bounds check and handling exceptions on page faults (when
> reading invalid kernel memory).
>
> So far, the implementation of PROBE_MEM (particularly on x86) has relied
> on bounds check because it needs to protect the BPF program from reading
> user addresses.  Loads for such addresses will lead to a kernel panic
> due to panic in do_user_addr_fault, because the page fault on accessing
> userspace address in kernel mode will be unhandled.
>
> This patch instead proposes to do exception handling in
> do_user_addr_fault when user addresses are accessed by a BPF program,
> and when SMAP is enabled on x86. This would obviate the need for the BPF
> JIT to emit bounds checking for PROBE_MEM load instructions, and any
> invalid memory accesses (either for user addresses or unmapped kernel
> addresses) will be handled by the page fault handler.
>
> This set does not grant programs any additional privileges than those
> they already had. Instead, it optimizes the common case of doing loads
> on valid kernel memory, while shifting the cost to cases where invalid
> kernel memory is accessed without sanitization by a program.
>
> Changelog:
> ----------
> v1 -> v2
> v1: https://lore.kernel.org/bpf/20240515233932.3733815-1-memxor@xxxxxxxxx
>
>  * Rebase on bpf-next
>
> Kumar Kartikeya Dwivedi (2):
>   x86: Perform BPF exception fixup in do_user_addr_fault
>   bpf, x86: Skip bounds checking for PROBE_MEM with SMAP
>
>  arch/x86/mm/fault.c         | 11 +++++++++++
>  arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
>  2 files changed, 20 insertions(+), 2 deletions(-)
>
>
> base-commit: f6afdaf72af7583d251bd569ded8d7d1eeb849c2
> -- 
> 2.43.0

We can also do something like this for ARM64 when PAN(Privileged Access
Never) is available. And if we are doing it then for RISC-V we can
remove this bounds checking completely because RISC-V always traps when
kernel accesses userspace addresses outside of uaccess routines.

But I am curious to know what other developers think about this.

Acked-by: Puranjay Mohan <puranjay@xxxxxxxxxx>

Thanks,
Puranjay
Attachment:
signature.asc

Description: PGP signature