Re: [PATCH bpf-next v3 1/2] x86: Perform BPF exception fixup in do_user_addr_fault

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Mon, 4 Nov 2024 11:50:46 -0600

On Mon, 4 Nov 2024 at 11:16, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 11/3/24 11:35, Kumar Kartikeya Dwivedi wrote:
> > Currently, on x86, when SMAP is enabled, and a page fault occurs in
> > kernel mode for accessing a user address, the kernel will rightly panic
> > as no valid kernel code can cause such a page fault (unless buggy).
> > There is no valid correct kernel code that can generate such a fault,
> > therefore this behavior would be correct.
> >
> > BPF programs that currently encounter user addresses when doing
> > PROBE_MEM loads (load instructions which are allowed to read any kernel
> > address, only available for root users) avoid a page fault by performing
> > bounds checking on the address.  This requires the JIT to emit a jump
> > over each PROBE_MEM load instruction to avoid hitting page faults.
>
> To be honest, I think the overhead (and complexity) is in the right spot
> today: the BPF program.  I don't want to complicate the x86 page fault
> handler for a debugging feature that already has a functional solution
> today.

This is not a debugging feature in BPF programs. Almost all tracing
BPF programs that potentially execute inside the kernel contain loads
which have to be marked PROBE_MEM because potentially, they may read a
user pointer and dereference it. The BPF runtime relies on PROBE_MEM
whenever reading from  pointers (for programs having root) that cannot
be trusted to be safe statically. While reading invalid memory is a
rare case that should not happen, it may be possible if some kernel
field contains a stale address etc. so loads into such pointers are
allowed statically, and handled/trapped at runtime to zero out the
destination register. So far x86 won't invoke fixup_exception for user
faults, as it wasn't supposed to happen, which is correct behavior
(hence the bounds check).

However, this cost is paid to handle a rare case in the common path of
BPF programs.
It would be nice to avoid it, by deferring to
fixup_exception->ex_handler_bpf in case the pc belongs to BPF text.
This was the least intrusive way I could think of doing it.

On arm64, the fault handler would do exception handling in such a
case. So with this patch we can think of removing the bounds checking
across architectures. We can do something similar for RISC-V as well.

I am fine with changing the commit log/title to reflect that it's an
added cost for the fault handler. I felt that it wouldn't matter
because the kernel was about to panic, so we had bailed anyway, but we
_do_ end up spending extra cycles in case this path is hit now.