On Tue, Nov 05, 2024 at 10:35:40AM -0800, Alexei Starovoitov wrote: > On Mon, Nov 4, 2024 at 11:54 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > On Sun, Nov 03, 2024 at 11:35:12AM -0800, Kumar Kartikeya Dwivedi wrote: > > > arch/x86/net/bpf_jit_comp.c | 11 +++++++++-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > > > index 06b080b61aa5..7e3bd589efc3 100644 > > > --- a/arch/x86/net/bpf_jit_comp.c > > > +++ b/arch/x86/net/bpf_jit_comp.c > > > @@ -1954,8 +1954,8 @@ st: if (is_imm8(insn->off)) > > > case BPF_LDX | BPF_PROBE_MEMSX | BPF_W: > > > insn_off = insn->off; > > > > > > - if (BPF_MODE(insn->code) == BPF_PROBE_MEM || > > > - BPF_MODE(insn->code) == BPF_PROBE_MEMSX) { > > > + if ((BPF_MODE(insn->code) == BPF_PROBE_MEM || > > > + BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) { > > > /* Conservatively check that src_reg + insn->off is a kernel address: > > > * src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE > > > * and > > > > Well, I can see why you'd want to get rid of that, that's quite > > dreadful code you generate there. > > > > Can't you do something like: > > > > lea off(%src), %r10 > > mov %r10, %r11 > > inc %r10 > > sar $63, %r11 > > and %r11, %r10 > > dec %r10 > > > > mov (%r10), %rax > > That's a Linus's hack for mask_user_address() and > earlier in valid_user_address(). Yes, something along those lines. Preserves everything with MSB 1, and maps the rest to ~0. > I don't think it works because of > #define VSYSCALL_ADDR (-10UL << 20) > > We had to filter out that range. Range of _1_ page. Also, nobody should ever touch that page these days anyway. > I don't understand why valid_user_address() is not broken, > since fault handler considers vsyscall address to be user addr > in fault_in_kernel_space(). > And user addr faulting doesn't have extable handling logic. The vsyscall page has it's own magical exception handling that does emulation. > > I realize that's not exactly pretty either, but no jumps. Not sure > > this'll help much if anything with the TDX thing though. > > to clarify... this is not bpf specific. This bpf JIT logic is > nothing but inlined version of copy_from_kernel_nofault(). > So if confidential computing has an issue lots of pieces are affected. It's not copy_from_kernel_nofault() that's a problem per-se, it's thinking it's 'safe' for random input that is.