Re: [RFC] Proposal: Static SECCOMP Policies

Maciej Żenczykowski <maze@xxxxxxxxxx> · Mon, 30 Sep 2024 16:41:19 -0700

On Mon, Sep 30, 2024 at 4:35 PM Maciej Żenczykowski <maze@xxxxxxxxxx> wrote:
>
> On Mon, Sep 30, 2024 at 4:22 AM Sebastian Ene <sebastianene@xxxxxxxxxx> wrote:
> >
> > On Wed, Sep 25, 2024 at 12:53:11PM -0700, 'Maciej Żenczykowski' via kernel-team wrote:
> > > On Wed, Sep 25, 2024 at 12:52 PM Maciej Żenczykowski <maze@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Sep 25, 2024 at 11:16 AM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Sep 17, 2024 at 8:08 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Fri, Sep 13, 2024 at 09:18:58PM GMT, Andy Lutomirski wrote:
> > > > > > > On Fri, Sep 13, 2024 at 10:30 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
> > > > > > > > On Fri, Sep 13, 2024 at 05:07:46PM GMT, Maxwell Bland wrote:
> > > > > > > >
> > > > > > > > But don't let me distract from the issue, which is that
> > > > > > > > cBPF/eBPF/however these filters get allocated to machine code,
> > > > > > > > bpf_int_jit_compile ends up getting called and a new
> > > > > > > > privileged-executable page gets allocated without compile-time
> > > > > > > > provenance (at least, without reverse engineering) for where that code
> > > > > > > > came from.
> > > > > > >
> > > > > > > But what if there was a mechanism to *cryptographically hash* a BPF
> > > > > > > program as part of the loading process?  Then that hash could be
> > > > > > > looked up in a list, and a decision could be made based on the result?
> > > > > > >  Would this help solve any problems?
> > > > > >
> > > > > > The issue I have seen in the prior Qualys linked exploit from my initial
> > > > > > message and from talks by security researchers elsewhere, for example
> > > > > > Google Project Zero's recent "Analyzing a Modern In-the-wild Android
> > > > > > Exploit" by Seth Jenkins, is that people have the ability to target
> > > > > > these pages during the window between the page being allocated as
> > > > > > writable by vmalloc.c and the update to the PTE which makes it
> > > > > > executable, so a signature does help (creates the requirement of more
> > > > > > than one write to commit "forgery"), but doesn't totally 100% solve the
> > > > > > problem.
> > > > > >
> > > > > > Right now, every time I open up chrome on our latest flagship the
> > > > > > browsers sandbox filters trigger my EL2 monitor because they are
> > > > > > attempting to follow the standard W^X protocol. If I were to build one
> > > > > > of these exploits, I'd:
> > > > > >
> > > > > > (1) find out a non-crashing leak for code page and data values
> > > > > > (2) determine from vmalloc's rb-tree where the next one-page allocation
> > > > > >     is likely to occur
> > > > > > (3) prime my write gadget for an offset into that page
> > > > > > (4) spin up chrome in a second thread
> > > > > > (5) attempt to trigger a write (or two) at the right precise time using
> > > > > >     prior empirical measurement or my read gadget for kernel mem
> > > > > >
> > > > > > Which is messy, but people have been known to do more given good enough
> > > > > > stakes. Hell, I spent a few months working on something similar for
> > > > > > airplane communication management units.
> > > > >
> > > > > My vague proposal for a "better JIT API" (which you quoted below)
> > > > > explicitly and completely solves this problem:
> > > > >
> > > > > >
> > > > > > > So what would a good solution look like?  It seem to me that the
> > > > > > > program being supervised (a userspace or kernel JIT) could generate
> > > > > > > some kind of data structure along these lines:
> > > > > > >
> > > > > > > - machine code to be materialized
> > > > > > >
> > > > > > > - address and length at which to materialize it (probably
> > > > > > > page-aligned, but maybe not)
> > > > > > >
> > > > > > > - an "origin" of this code (perhaps a file handle?) -- I'm not 100%
> > > > > > > sure this is useful
> > > > > > >
> > > > > > > - a "justification" for the code.  This could be something like "Hey,
> > > > > > > this is JITted from cBPF for seccomp, and here's the cBPF".
> > > > >
> > > > > Even ignoring the origin and justification parts, there's no WX window
> > > > > in here.  The code is generated, then it's shipped off to the
> > > > > hypervisor/supervisor, and *exactly that code* is materialized !W, X.
> > > > >
> > > > > Of course, this still leaves verification to be handled.
> > > > >
> > > > > > Returning to the idea of origins, at the end of the work day yesterday I
> > > > > > queried Maciej to "have Android choose one compiler for seccomp policies
> > > > > > to BPF and stick with it", because if I knew filters were chosen by
> > > > > > libminijail or some other userspace system, I could pretty easily figure
> > > > > > out what EL2 needs to expect at runtime. An "origin" field would be
> > > > > > equally as effective, and retain flexibility.
> > > > >
> > > > > At the risk of a silly suggestion, what if the entire JIT compiler and
> > > > > verifier (or a sufficient portion) were, itself, a WASM (or similar)
> > > > > program, signed or whatever, and shipped off to the hypervisor?  The
> > > > > hypervisor could run it (in whatever sandbox it likes -- hypervisors
> > > > > are capable of spawning a separate VM to host it if needed), and only
> > > > > then accept the output.
> > > > >
> > > > > I, personally, think that this is of extremely dubious value unless
> > > > > it's paired with a control flow integrity system.  But maybe it could
> > > > > be!  Something like x86 IBT would be a start, and FineIBT would be
> > > > > better, as would an ARM equivalent.
> > > > >
> > > > > --Andy
> > > >
> >
> > Hi,
> >
> > In response to your previous message (this is Seb from pKVM team):
> >
> >
> > > > I've heard rumours (probably read some LWN article perhaps
> > > > https://lwn.net/Articles/836693/ ) that protected kvm for Android has
> > > > some mechanism to start the kernel in some higher priv level (EL2?),
> > > > then move most of it to EL1 while keeping a protected VPN shim in EL2.
> > >
> > > s/VPN/KVM/
> >
> > Yes we do initialize the pKVM hypervisor at EL2 fairly early at
> > device_initcall_sync (initcall 5) before we depriviledge the rest of the
> > kernel at EL1.
>
> I'd love to learn more about this for some unrelated reasons.
> Even been considering dropping by London to chat about it (with Will)
> at some point.
>
> > > > Perhaps the answer is to leave the bpf verifier + jit compiler in EL2?
> > >
> > What are the gains to move this at EL2 ? I am a bit late to this party.
> > We don't have any init at that stage because it is too early. We do
> > support some EL2 vendor modules loading from a ramdisk but this is a
> > different story.
>
> I think the OP is trying to verify the 'sanctity' of EL1 code pages.
> (ie. prove via signature that they're all legit, which is hard with jit)
> Presumably he's doing this from EL2 (I seriously doubt he's in EL3).
> There's been talk of
> unjitting/rejitting/regenerating/peephole-verifying the BPF jitted
> dynamically generated kernel executable pages - to verify they're
> 'safe'.
> Moving just the 'bpf verifier/jit' into EL2 would seem to solve that
> particular problem.
> Though of course that is a fair bit of code (though the only untrusted
> input to it, post boot completion, is cBPF which is pretty small in
> scope)...
> Compromises of EL0/EL1 would no longer be able to write gadget over
> the bpf jitted kernel executable page prior to them being marked -W+X.
> I'm not certain how much of a win in safety this is though?
> I guess it depends on how easy the bpf verifier/jitter is to audit.

Note: if the full blown bpf verifier/jitter is too hard to audit, you
could potentially write a new EL2 jitter just for cBPF.  It could just
be a trimmed down version of the generic eBPF jitter.  cBPF is much
much simpler.

>
>
> >
> > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@xxxxxxxxxxx.
> > >
> >
> > Thanks,
> > Seb