Re: [PATCH bpf-next v3 2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 5 Nov 2024 10:35:40 -0800

On Mon, Nov 4, 2024 at 11:54 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Sun, Nov 03, 2024 at 11:35:12AM -0800, Kumar Kartikeya Dwivedi wrote:
> >  arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 06b080b61aa5..7e3bd589efc3 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -1954,8 +1954,8 @@ st:                     if (is_imm8(insn->off))
> >               case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
> >                       insn_off = insn->off;
> >
> > -                     if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > -                         BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
> > +                     if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > +                          BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) {
> >                               /* Conservatively check that src_reg + insn->off is a kernel address:
> >                                *   src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
> >                                *   and
>
> Well, I can see why you'd want to get rid of that, that's quite
> dreadful code you generate there.
>
> Can't you do something like:
>
>   lea off(%src), %r10
>   mov %r10, %r11
>   inc %r10
>   sar $63, %r11
>   and %r11, %r10
>   dec %r10
>
>   mov (%r10), %rax

That's a Linus's hack for mask_user_address() and
earlier in valid_user_address().
I don't think it works because of
#define VSYSCALL_ADDR (-10UL << 20)

We had to filter out that range.

I don't understand why valid_user_address() is not broken,
since fault handler considers vsyscall address to be user addr
in fault_in_kernel_space().
And user addr faulting doesn't have extable handling logic.

> I realize that's not exactly pretty either, but no jumps. Not sure
> this'll help much if anything with the TDX thing though.

to clarify... this is not bpf specific. This bpf JIT logic is
nothing but inlined version of copy_from_kernel_nofault().
So if confidential computing has an issue lots of pieces are affected.

So this patch set is the preferred way to accelerate this
inlined copy_from_kernel_nofault().
If it lands we can follow up and optimize copy_from_kernel_nofault()
with cpu_feature_enabled(X86_FEATURE_SMAP) as well.
Though I'm not sure whether slab get_freepointer_safe() cares
that much about saving nanoseconds. But bpf does.