Re: [RFC] Proposal: Static SECCOMP Policies

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Wed, 25 Sep 2024 11:16:00 -0700

On Tue, Sep 17, 2024 at 8:08 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
>
> On Fri, Sep 13, 2024 at 09:18:58PM GMT, Andy Lutomirski wrote:
> > On Fri, Sep 13, 2024 at 10:30 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
> > > On Fri, Sep 13, 2024 at 05:07:46PM GMT, Maxwell Bland wrote:
> > >
> > > But don't let me distract from the issue, which is that
> > > cBPF/eBPF/however these filters get allocated to machine code,
> > > bpf_int_jit_compile ends up getting called and a new
> > > privileged-executable page gets allocated without compile-time
> > > provenance (at least, without reverse engineering) for where that code
> > > came from.
> >
> > But what if there was a mechanism to *cryptographically hash* a BPF
> > program as part of the loading process?  Then that hash could be
> > looked up in a list, and a decision could be made based on the result?
> >  Would this help solve any problems?
>
> The issue I have seen in the prior Qualys linked exploit from my initial
> message and from talks by security researchers elsewhere, for example
> Google Project Zero's recent "Analyzing a Modern In-the-wild Android
> Exploit" by Seth Jenkins, is that people have the ability to target
> these pages during the window between the page being allocated as
> writable by vmalloc.c and the update to the PTE which makes it
> executable, so a signature does help (creates the requirement of more
> than one write to commit "forgery"), but doesn't totally 100% solve the
> problem.
>
> Right now, every time I open up chrome on our latest flagship the
> browsers sandbox filters trigger my EL2 monitor because they are
> attempting to follow the standard W^X protocol. If I were to build one
> of these exploits, I'd:
>
> (1) find out a non-crashing leak for code page and data values
> (2) determine from vmalloc's rb-tree where the next one-page allocation
>     is likely to occur
> (3) prime my write gadget for an offset into that page
> (4) spin up chrome in a second thread
> (5) attempt to trigger a write (or two) at the right precise time using
>     prior empirical measurement or my read gadget for kernel mem
>
> Which is messy, but people have been known to do more given good enough
> stakes. Hell, I spent a few months working on something similar for
> airplane communication management units.

My vague proposal for a "better JIT API" (which you quoted below)
explicitly and completely solves this problem:

>
> > So what would a good solution look like?  It seem to me that the
> > program being supervised (a userspace or kernel JIT) could generate
> > some kind of data structure along these lines:
> >
> > - machine code to be materialized
> >
> > - address and length at which to materialize it (probably
> > page-aligned, but maybe not)
> >
> > - an "origin" of this code (perhaps a file handle?) -- I'm not 100%
> > sure this is useful
> >
> > - a "justification" for the code.  This could be something like "Hey,
> > this is JITted from cBPF for seccomp, and here's the cBPF".

Even ignoring the origin and justification parts, there's no WX window
in here.  The code is generated, then it's shipped off to the
hypervisor/supervisor, and *exactly that code* is materialized !W, X.

Of course, this still leaves verification to be handled.

> Returning to the idea of origins, at the end of the work day yesterday I
> queried Maciej to "have Android choose one compiler for seccomp policies
> to BPF and stick with it", because if I knew filters were chosen by
> libminijail or some other userspace system, I could pretty easily figure
> out what EL2 needs to expect at runtime. An "origin" field would be
> equally as effective, and retain flexibility.

At the risk of a silly suggestion, what if the entire JIT compiler and
verifier (or a sufficient portion) were, itself, a WASM (or similar)
program, signed or whatever, and shipped off to the hypervisor?  The
hypervisor could run it (in whatever sandbox it likes -- hypervisors
are capable of spawning a separate VM to host it if needed), and only
then accept the output.

I, personally, think that this is of extremely dubious value unless
it's paired with a control flow integrity system.  But maybe it could
be!  Something like x86 IBT would be a start, and FineIBT would be
better, as would an ARM equivalent.

--Andy