Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly

Andy Lutomirski <luto@xxxxxxxxxx> · Thu, 28 Jan 2021 17:31:35 -0800

On Thu, Jan 28, 2021 at 5:04 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Jan 28, 2021 at 04:45:41PM -0800, Andy Lutomirski wrote:
> > On Thu, Jan 28, 2021 at 4:41 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Thu, Jan 28, 2021 at 04:29:51PM -0800, Andy Lutomirski wrote:
> > > > BPF generated a NULL pointer dereference (where NULL is a user
> > > > pointer) and expected it to recover cleanly. What exactly am I
> > > > supposed to debug?  IMO the only thing wrong with the x86 code is that
> > > > it doesn't complain more loudly.  I will fix that, too.
> > >
> > > are you saying that NULL is a _user_ pointer?!
> > > It's NULL. All zeros.
> > > probe_read_kernel(NULL) was returning EFAULT on it and should continue doing so.
> >
> > probe_read_kernel() does not exist.  get_kernel_nofault() returns -ERANGE.
>
> That was an old name. bpf_probe_read_kernel() is using copy_from_kernel_nofault() now.
>
> > And yes, NULL is a user pointer.  I can write you a little Linux
> > program that maps some real valid data at user address 0.  As I noted
>
> are you sure? I thought mmap of addr zero was disallowed long ago.

Quite sure.

#define _GNU_SOURCE

#include <stdio.h>
#include <sys/mman.h>
#include <err.h>

int main()
{
    int *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_ANONYMOUS |
MAP_PRIVATE | MAP_FIXED, -1, 0);
    if (ptr == MAP_FAILED)
        err(1, "mmap");

    *ptr = 1;
    printf("I just wrote %d to %p\n", *ptr, ptr);
    return 0;
}

Whether this works or not depends on a complicated combination of
sysctl settings, process capabilities, and whether SELinux feels like
adding its own restrictions.  But it does work on current kernels.

>
> > when I first analyzed this bug, because NULL is a user address, bpf is
> > incorrectly triggering the *user* fault handling code, and that code
> > is objecting.
> >
> > I propose the following fix to the x86 code.  I'll send it as a real
> > patch tomorrow.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=f61282777772f375bba7130ae39ccbd7e83878b2
>
> You realize that you propose to panic kernels for all existing tracing users, right?
>
> Do you have a specific security concern with treating fault on NULL special?

The security concern is probably not that severe because of the
aforementioned restrictions, but I haven't analyzed it very carefully.
But I do have a *functionality* concern.  As the original email that
prompted this whole discussion noted, the current x86 fault code does
not do what you want it to do if SMAP is off.  The fact that it mostly
works with SMAP on is luck.  With SMAP off, we will behave erratically
at best.

What exactly could the fault code even do to fix this up?  Something like:

if (addr == 0 && SMAP off && error_code says it's kernel mode && we
don't have permission to map NULL) {
  special care for bpf;
}

This seems arbitrary and fragile.  And it's not obviously
implementable safely without taking locks, which we really ought not
to be doing from inside arbitrary bpf programs.  Keep in mind that, if
SMAP is unavailable, the fault code literally does not know whether
the page fault originated form a valid uaccess region.

There's also always the possibility that you simultaneously run bpf
and something like Wine or DOSEMU2 that actually maps the zero page,
in which case the effect of the bpf code is going to be quite erratic
and will depend on which mm is loaded.

--Andy