On Thu, Jan 28, 2021 at 07:09:29PM -0800, Andy Lutomirski wrote: > > > > On Jan 28, 2021, at 6:33 PM, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Thu, Jan 28, 2021 at 06:18:37PM -0800, Andy Lutomirski wrote: > >>> On Thu, Jan 28, 2021 at 5:53 PM Alexei Starovoitov > >>> <alexei.starovoitov@xxxxxxxxx> wrote: > >>> > >>> On Thu, Jan 28, 2021 at 05:31:35PM -0800, Andy Lutomirski wrote: > >>>> > >>>> What exactly could the fault code even do to fix this up? Something like: > >>>> > >>>> if (addr == 0 && SMAP off && error_code says it's kernel mode && we > >>>> don't have permission to map NULL) { > >>>> special care for bpf; > >>>> } > >>> > >>> right. where 'special care' is checking extable and skipping single > >>> load instruction. > >>> > >>>> This seems arbitrary and fragile. And it's not obviously > >>>> implementable safely without taking locks, > >>> > >>> search_bpf_extables() needs rcu_read_lock() only. > >>> Not the locks you're talking about. > >> > >> I mean the locks in the if statement. How am I supposed to tell > >> whether this fault is a special bpf fault or a normal page fault > >> without taking a lock to look up the VMA or to do some other hack? > > > > search_bpf_extables() only needs a faulting rip. > > No need to lookup vma. > > if (addr == 0 && search_bpf_extables(regs->ip)... > > is trivial enough and won't penalize page faults in general. > > These conditions are not going to happen in the normal kernel code. > > The need to make this decision will happen in normal code. > > The page fault code is a critical piece of the kernel. The absolute top priority is correctness. Then performance for the actual common case, which is a regular page fault against a valid VMA. Everything else is lower priority. > > This means I’m not about to add an if (special bpf insn) before the VMA lookup. And by the time we do the VMA lookup, we have already lost. what's the difference between bpf prog and kprobe ? Not much from page fault pov. They both do things that are not normal for the rest of the kernel. if (kprobe_page_fault()) is handled first by the fault handler. So there is a precedent to handle special things. > You could try to play games with pagefault_disable(), but that will have its own problems. > > > > The faulting address and faulting ip will precisely identify this situation. > > There is no guess work. > > If there is a fault on an instruction with an exception handler and the faulting address is a valid user pointer, > it is absolutely ambiguous whether it’s BPF chasing a NULL pointer or something else following a valid user pointer. It's not ambiguous. Please take a look at search_bpf_extables. It's not like the whole bpf program is one special region. Only certain specific instructions are in extable. That table was dynamically generated by JIT. Just like kernel module's extables. bpf helpers are not in the extable. Most of the bpf prog code is not in it either.