On Fri, Sep 13, 2024 at 09:18:58PM GMT, Andy Lutomirski wrote: > On Fri, Sep 13, 2024 at 10:30 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote: > > On Fri, Sep 13, 2024 at 05:07:46PM GMT, Maxwell Bland wrote: > > > > But don't let me distract from the issue, which is that > > cBPF/eBPF/however these filters get allocated to machine code, > > bpf_int_jit_compile ends up getting called and a new > > privileged-executable page gets allocated without compile-time > > provenance (at least, without reverse engineering) for where that code > > came from. > > But what if there was a mechanism to *cryptographically hash* a BPF > program as part of the loading process? Then that hash could be > looked up in a list, and a decision could be made based on the result? > Would this help solve any problems? The issue I have seen in the prior Qualys linked exploit from my initial message and from talks by security researchers elsewhere, for example Google Project Zero's recent "Analyzing a Modern In-the-wild Android Exploit" by Seth Jenkins, is that people have the ability to target these pages during the window between the page being allocated as writable by vmalloc.c and the update to the PTE which makes it executable, so a signature does help (creates the requirement of more than one write to commit "forgery"), but doesn't totally 100% solve the problem. Right now, every time I open up chrome on our latest flagship the browsers sandbox filters trigger my EL2 monitor because they are attempting to follow the standard W^X protocol. If I were to build one of these exploits, I'd: (1) find out a non-crashing leak for code page and data values (2) determine from vmalloc's rb-tree where the next one-page allocation is likely to occur (3) prime my write gadget for an offset into that page (4) spin up chrome in a second thread (5) attempt to trigger a write (or two) at the right precise time using prior empirical measurement or my read gadget for kernel mem Which is messy, but people have been known to do more given good enough stakes. Hell, I spent a few months working on something similar for airplane communication management units. > So what would a good solution look like? It seem to me that the > program being supervised (a userspace or kernel JIT) could generate > some kind of data structure along these lines: > > - machine code to be materialized > > - address and length at which to materialize it (probably > page-aligned, but maybe not) > > - an "origin" of this code (perhaps a file handle?) -- I'm not 100% > sure this is useful > > - a "justification" for the code. This could be something like "Hey, > this is JITted from cBPF for seccomp, and here's the cBPF". > > Or there could be a more indirect variant: > > - source to be JITed (cBPF, WASM, eBPF, whatever) > > - enough relocation info for the supervisor to JIT it appropriately > > - address to materialize the code at, along with maximum size > > and the supervisor JITs it and materializes it. > > I could imagine this being used for userspace and for hypervisor-based > kernel integrity. Does it do what's needed here if there was a > hypercall kind of like this? > "Origin" to me seems like the most significant part, as it should be possible for engineers to hack in the rest based upon the implicit contract provided by the software that is trying to compile the program. Expanding on the other points, right now, I'm trying to see if it is possible to orient EL2 so that there is little to no standard "runtime" interface to the security monitor, as Samsung historically had issues with respect to these routes leading to exploits because the engineers (like me) were not super skilled. That is, pushing the verification effort to EL2 will be more dangerous, since EL2's code now has the possibility for error in the JIT which has an out-of-bounds write. Returning to the idea of origins, at the end of the work day yesterday I queried Maciej to "have Android choose one compiler for seccomp policies to BPF and stick with it", because if I knew filters were chosen by libminijail or some other userspace system, I could pretty easily figure out what EL2 needs to expect at runtime. An "origin" field would be equally as effective, and retain flexibility. Here's what I have now that is actually enough to lock down most of everything except the seccomp filters and dynamic datastructures (kworker, e.g. call_usermode_exec_helper, queues will be the motivating example at that point): case MARK_RANGE_RO: /* Set the RO bit on a stage-2 PTE/PMD range */ case ADD_JUMP_ENTRY_LOOKUP: /* Add in exceptions for static_keys */ case LOCK: /* Prevent any further SMC calls outside of *_TUPLE */ case SPLIT_BLOCK: /* Demote (PMD) hugepage to PTEs */ case REGISTER_AMEM: /* Preserve region of physical mem for just EL2 */ Maxwell