Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly

Andy Lutomirski <luto@xxxxxxxxxx> · Thu, 28 Jan 2021 18:18:37 -0800

On Thu, Jan 28, 2021 at 5:53 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Jan 28, 2021 at 05:31:35PM -0800, Andy Lutomirski wrote:
> >
> > What exactly could the fault code even do to fix this up?  Something like:
> >
> > if (addr == 0 && SMAP off && error_code says it's kernel mode && we
> > don't have permission to map NULL) {
> >   special care for bpf;
> > }
>
> right. where 'special care' is checking extable and skipping single
> load instruction.
>
> > This seems arbitrary and fragile.  And it's not obviously
> > implementable safely without taking locks,
>
> search_bpf_extables() needs rcu_read_lock() only.
> Not the locks you're talking about.

I mean the locks in the if statement.  How am I supposed to tell
whether this fault is a special bpf fault or a normal page fault
without taking a lock to look up the VMA or to do some other hack?
Also, why should BPF get special dispensation to generate a special
kind of nonsensical page fault that no other kernel code is allowed to
do?

>
> > which we really ought not
> > to be doing from inside arbitrary bpf programs.
>
> Not in arbitrary progs. It's only available to progs loaded by root.
>
> > Keep in mind that, if
> > SMAP is unavailable, the fault code literally does not know whether
> > the page fault originated form a valid uaccess region.
> >
> > There's also always the possibility that you simultaneously run bpf
> > and something like Wine or DOSEMU2 that actually maps the zero page,
> > in which case the effect of the bpf code is going to be quite erratic
> > and will depend on which mm is loaded.
>
> It's tracing. Folks who write those progs accepted the fact that
> the data returned by probe_read is not going to be 100% accurate.

Is this really all tracing or is some of it actual live network code?

>
> bpf jit can insert !null check before every such special load
> (since it knows all of them), but it's an obvious performance loss
> that would be good to avoid. If fault handler can do this
> if (addr == 0 && ...) search_bpf_extables()
> at least in some conditions and cpu flags it's already a win.
> In all other cases bpf jit will insert !null checks.

Again, there is no guarantee that a page fault is even generated for
this.  And it doesn't seem very reasonable for the fault code to have
to decide whether a NULL pointer dereference is a special BPF fault or
a real user access fault against a VMA at address 9.

--Andy