Re: [RFC] Proposal: Static SECCOMP Policies

Maciej Żenczykowski <maze@xxxxxxxxxx> · Mon, 30 Sep 2024 16:35:16 -0700

On Mon, Sep 30, 2024 at 4:22 AM Sebastian Ene <sebastianene@xxxxxxxxxx> wrote:
>
> On Wed, Sep 25, 2024 at 12:53:11PM -0700, 'Maciej Żenczykowski' via kernel-team wrote:
> > On Wed, Sep 25, 2024 at 12:52 PM Maciej Żenczykowski <maze@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Sep 25, 2024 at 11:16 AM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Sep 17, 2024 at 8:08 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Sep 13, 2024 at 09:18:58PM GMT, Andy Lutomirski wrote:
> > > > > > On Fri, Sep 13, 2024 at 10:30 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
> > > > > > > On Fri, Sep 13, 2024 at 05:07:46PM GMT, Maxwell Bland wrote:
> > > > > > >
> > > > > > > But don't let me distract from the issue, which is that
> > > > > > > cBPF/eBPF/however these filters get allocated to machine code,
> > > > > > > bpf_int_jit_compile ends up getting called and a new
> > > > > > > privileged-executable page gets allocated without compile-time
> > > > > > > provenance (at least, without reverse engineering) for where that code
> > > > > > > came from.
> > > > > >
> > > > > > But what if there was a mechanism to *cryptographically hash* a BPF
> > > > > > program as part of the loading process?  Then that hash could be
> > > > > > looked up in a list, and a decision could be made based on the result?
> > > > > >  Would this help solve any problems?
> > > > >
> > > > > The issue I have seen in the prior Qualys linked exploit from my initial
> > > > > message and from talks by security researchers elsewhere, for example
> > > > > Google Project Zero's recent "Analyzing a Modern In-the-wild Android
> > > > > Exploit" by Seth Jenkins, is that people have the ability to target
> > > > > these pages during the window between the page being allocated as
> > > > > writable by vmalloc.c and the update to the PTE which makes it
> > > > > executable, so a signature does help (creates the requirement of more
> > > > > than one write to commit "forgery"), but doesn't totally 100% solve the
> > > > > problem.
> > > > >
> > > > > Right now, every time I open up chrome on our latest flagship the
> > > > > browsers sandbox filters trigger my EL2 monitor because they are
> > > > > attempting to follow the standard W^X protocol. If I were to build one
> > > > > of these exploits, I'd:
> > > > >
> > > > > (1) find out a non-crashing leak for code page and data values
> > > > > (2) determine from vmalloc's rb-tree where the next one-page allocation
> > > > >     is likely to occur
> > > > > (3) prime my write gadget for an offset into that page
> > > > > (4) spin up chrome in a second thread
> > > > > (5) attempt to trigger a write (or two) at the right precise time using
> > > > >     prior empirical measurement or my read gadget for kernel mem
> > > > >
> > > > > Which is messy, but people have been known to do more given good enough
> > > > > stakes. Hell, I spent a few months working on something similar for
> > > > > airplane communication management units.
> > > >
> > > > My vague proposal for a "better JIT API" (which you quoted below)
> > > > explicitly and completely solves this problem:
> > > >
> > > > >
> > > > > > So what would a good solution look like?  It seem to me that the
> > > > > > program being supervised (a userspace or kernel JIT) could generate
> > > > > > some kind of data structure along these lines:
> > > > > >
> > > > > > - machine code to be materialized
> > > > > >
> > > > > > - address and length at which to materialize it (probably
> > > > > > page-aligned, but maybe not)
> > > > > >
> > > > > > - an "origin" of this code (perhaps a file handle?) -- I'm not 100%
> > > > > > sure this is useful
> > > > > >
> > > > > > - a "justification" for the code.  This could be something like "Hey,
> > > > > > this is JITted from cBPF for seccomp, and here's the cBPF".
> > > >
> > > > Even ignoring the origin and justification parts, there's no WX window
> > > > in here.  The code is generated, then it's shipped off to the
> > > > hypervisor/supervisor, and *exactly that code* is materialized !W, X.
> > > >
> > > > Of course, this still leaves verification to be handled.
> > > >
> > > > > Returning to the idea of origins, at the end of the work day yesterday I
> > > > > queried Maciej to "have Android choose one compiler for seccomp policies
> > > > > to BPF and stick with it", because if I knew filters were chosen by
> > > > > libminijail or some other userspace system, I could pretty easily figure
> > > > > out what EL2 needs to expect at runtime. An "origin" field would be
> > > > > equally as effective, and retain flexibility.
> > > >
> > > > At the risk of a silly suggestion, what if the entire JIT compiler and
> > > > verifier (or a sufficient portion) were, itself, a WASM (or similar)
> > > > program, signed or whatever, and shipped off to the hypervisor?  The
> > > > hypervisor could run it (in whatever sandbox it likes -- hypervisors
> > > > are capable of spawning a separate VM to host it if needed), and only
> > > > then accept the output.
> > > >
> > > > I, personally, think that this is of extremely dubious value unless
> > > > it's paired with a control flow integrity system.  But maybe it could
> > > > be!  Something like x86 IBT would be a start, and FineIBT would be
> > > > better, as would an ARM equivalent.
> > > >
> > > > --Andy
> > >
>
> Hi,
>
> In response to your previous message (this is Seb from pKVM team):
>
>
> > > I've heard rumours (probably read some LWN article perhaps
> > > https://lwn.net/Articles/836693/ ) that protected kvm for Android has
> > > some mechanism to start the kernel in some higher priv level (EL2?),
> > > then move most of it to EL1 while keeping a protected VPN shim in EL2.
> >
> > s/VPN/KVM/
>
> Yes we do initialize the pKVM hypervisor at EL2 fairly early at
> device_initcall_sync (initcall 5) before we depriviledge the rest of the
> kernel at EL1.

I'd love to learn more about this for some unrelated reasons.
Even been considering dropping by London to chat about it (with Will)
at some point.

> > > Perhaps the answer is to leave the bpf verifier + jit compiler in EL2?
> >
> What are the gains to move this at EL2 ? I am a bit late to this party.
> We don't have any init at that stage because it is too early. We do
> support some EL2 vendor modules loading from a ramdisk but this is a
> different story.

I think the OP is trying to verify the 'sanctity' of EL1 code pages.
(ie. prove via signature that they're all legit, which is hard with jit)
Presumably he's doing this from EL2 (I seriously doubt he's in EL3).
There's been talk of
unjitting/rejitting/regenerating/peephole-verifying the BPF jitted
dynamically generated kernel executable pages - to verify they're
'safe'.
Moving just the 'bpf verifier/jit' into EL2 would seem to solve that
particular problem.
Though of course that is a fair bit of code (though the only untrusted
input to it, post boot completion, is cBPF which is pretty small in
scope)...
Compromises of EL0/EL1 would no longer be able to write gadget over
the bpf jitted kernel executable page prior to them being marked -W+X.
I'm not certain how much of a win in safety this is though?
I guess it depends on how easy the bpf verifier/jitter is to audit.

>
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@xxxxxxxxxxx.
> >
>
> Thanks,
> Seb