Re: [RFC] Proposal: Static SECCOMP Policies

Maxwell Bland <mbland@xxxxxxxxxxxx> · Tue, 17 Sep 2024 10:08:47 -0500

On Fri, Sep 13, 2024 at 09:18:58PM GMT, Andy Lutomirski wrote:
> On Fri, Sep 13, 2024 at 10:30 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
> > On Fri, Sep 13, 2024 at 05:07:46PM GMT, Maxwell Bland wrote:
> >
> > But don't let me distract from the issue, which is that
> > cBPF/eBPF/however these filters get allocated to machine code,
> > bpf_int_jit_compile ends up getting called and a new
> > privileged-executable page gets allocated without compile-time
> > provenance (at least, without reverse engineering) for where that code
> > came from.
> 
> But what if there was a mechanism to *cryptographically hash* a BPF
> program as part of the loading process?  Then that hash could be
> looked up in a list, and a decision could be made based on the result?
>  Would this help solve any problems?

The issue I have seen in the prior Qualys linked exploit from my initial
message and from talks by security researchers elsewhere, for example
Google Project Zero's recent "Analyzing a Modern In-the-wild Android
Exploit" by Seth Jenkins, is that people have the ability to target
these pages during the window between the page being allocated as
writable by vmalloc.c and the update to the PTE which makes it
executable, so a signature does help (creates the requirement of more
than one write to commit "forgery"), but doesn't totally 100% solve the
problem.

Right now, every time I open up chrome on our latest flagship the
browsers sandbox filters trigger my EL2 monitor because they are
attempting to follow the standard W^X protocol. If I were to build one
of these exploits, I'd:

(1) find out a non-crashing leak for code page and data values
(2) determine from vmalloc's rb-tree where the next one-page allocation
    is likely to occur
(3) prime my write gadget for an offset into that page
(4) spin up chrome in a second thread
(5) attempt to trigger a write (or two) at the right precise time using
    prior empirical measurement or my read gadget for kernel mem

Which is messy, but people have been known to do more given good enough
stakes. Hell, I spent a few months working on something similar for
airplane communication management units.

> So what would a good solution look like?  It seem to me that the
> program being supervised (a userspace or kernel JIT) could generate
> some kind of data structure along these lines:
> 
> - machine code to be materialized
> 
> - address and length at which to materialize it (probably
> page-aligned, but maybe not)
> 
> - an "origin" of this code (perhaps a file handle?) -- I'm not 100%
> sure this is useful
> 
> - a "justification" for the code.  This could be something like "Hey,
> this is JITted from cBPF for seccomp, and here's the cBPF".
> 
> Or there could be a more indirect variant:
> 
> - source to be JITed (cBPF, WASM, eBPF, whatever)
> 
> - enough relocation info for the supervisor to JIT it appropriately
> 
> - address to materialize the code at, along with maximum size
> 
> and the supervisor JITs it and materializes it.
> 
> I could imagine this being used for userspace and for hypervisor-based
> kernel integrity.  Does it do what's needed here if there was a
> hypercall kind of like this?
>
"Origin" to me seems like the most significant part, as it should be
possible for engineers to hack in the rest based upon the implicit
contract provided by the software that is trying to compile the program.

Expanding on the other points, right now, I'm trying to see if it is
possible to orient EL2 so that there is little to no standard "runtime"
interface to the security monitor, as Samsung historically had issues
with respect to these routes leading to exploits because the engineers
(like me) were not super skilled. That is, pushing the verification
effort to EL2 will be more dangerous, since EL2's code now has the
possibility for error in the JIT which has an out-of-bounds write.

Returning to the idea of origins, at the end of the work day yesterday I
queried Maciej to "have Android choose one compiler for seccomp policies
to BPF and stick with it", because if I knew filters were chosen by
libminijail or some other userspace system, I could pretty easily figure
out what EL2 needs to expect at runtime. An "origin" field would be
equally as effective, and retain flexibility.

Here's what I have now that is actually enough to lock down most of everything
except the seccomp filters and dynamic datastructures (kworker, e.g.
call_usermode_exec_helper, queues will be the motivating example at that
point):

case MARK_RANGE_RO: /* Set the RO bit on a stage-2 PTE/PMD range */
case ADD_JUMP_ENTRY_LOOKUP: /* Add in exceptions for static_keys */
case LOCK: /* Prevent any further SMC calls outside of *_TUPLE */
case SPLIT_BLOCK: /* Demote (PMD) hugepage to PTEs */
case REGISTER_AMEM: /* Preserve region of physical mem for just EL2 */

Maxwell