On Fri, Sep 13, 2024 at 10:30 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote: > > On Fri, Sep 13, 2024 at 05:07:46PM GMT, Maxwell Bland wrote: > > > These programs will not print out using PTRACE and are difficult to audit > > without patching the seccomp calls yourself because the ptrace call to > > PTRACE_SECCOMP_GET_FILTER will fail. I believe (have not checked) because they > > are not cBPF, and seccomp's logic makes prog->fprog evaluates to null despite > > prog existing if it is cBPF, at least on Android 14. I spent a whole day > > getting frustrated with the failing ptrace call before finally ending up my > > patches (attached to the end) that instrument ptrace and can print the > > programs. > > LOL, this paragraph is a mess, apologies: I'm referencing the failure of > get_seccomp_filter in seccomp.c here: > > fprog = filter->prog->orig_prog; > if (!fprog) { > /* This must be a new non-cBPF filter, since we save > * every cBPF filter's orig_prog above when > * CONFIG_CHECKPOINT_RESTORE is enabled. > */ > ret = -EMEDIUMTYPE; > goto out; > } > > Though CONFIG_CHECKPOINT_RESTORE is not set on Android 14, so I think > the ptrace probably failed for all sorts of reasons unrelated to cBPF. > > But don't let me distract from the issue, which is that > cBPF/eBPF/however these filters get allocated to machine code, > bpf_int_jit_compile ends up getting called and a new > privileged-executable page gets allocated without compile-time > provenance (at least, without reverse engineering) for where that code > came from. Mulling over this a bit, I think there are sort of two issues here, and they're sort of orthogonal to each other. The easy one first: can there be a static or somewhat static or at least administrator-controlled list of seccomp cBPF programs? (Where administrator is, sadly, probably not the actual owner of a phone, but that ship sailed a long time ago.). Trying to make a list *and reference that list from programs loading filters* seems like a huge breaking change, not to mention that getting it to work right in namespaces will be extra complex. But what if there was a mechanism to *cryptographically hash* a BPF program as part of the loading process? Then that hash could be looked up in a list, and a decision could be made based on the result? Would this help solve any problems? Okay, on to the hard part: code integrity. I've mulled over this a bit from the perspective of userspace JITs and their interaction with kernel-enforced security. Kernel-based JITs and their interactions with hypervisor security are rather similar. (They're *not* the same. The kernel can and does muck with its own pagetables. User code can't. But I don't think this is a huge difference here as to the big picture.) There's also self-modifying code (existing executable code that changes) and code generation (code that is created where code previously didn't exist). I'm going to focus on the latter. Today, userspace can use nasty APIs to allocate writable memory, then write to it, then change it to be executable. This comes with gnarly architecture-specific coherency issues, and it doesn't give a great way for the kernel to render an intelligent opinion. And, today, the kernel can allocate memory (by futzing with pagetables or just using existing maps), write some code, then either change the permissions to executable or create a new executable alias, and then do the architecture-specific incantation to make it coherent, then run it. In neither case is there an amazing way for the supervisor (kernel or hypervisor) to render an opinion about the code, and in the userspace case, the actual efficiency of the process is quite low. So what would a good solution look like? It seem to me that the program being supervised (a userspace or kernel JIT) could generate some kind of data structure along these lines: - machine code to be materialized - address and length at which to materialize it (probably page-aligned, but maybe not) - an "origin" of this code (perhaps a file handle?) -- I'm not 100% sure this is useful - a "justification" for the code. This could be something like "Hey, this is JITted from cBPF for seccomp, and here's the cBPF". Or there could be a more indirect variant: - source to be JITed (cBPF, WASM, eBPF, whatever) - enough relocation info for the supervisor to JIT it appropriately - address to materialize the code at, along with maximum size and the supervisor JITs it and materializes it. I could imagine this being used for userspace and for hypervisor-based kernel integrity. Does it do what's needed here if there was a hypercall kind of like this? I can also imagine this being considerably faster than what current userspace does. On x86, for example, the kernel could populate a page with the JITted code, then map that page at an address where nothing was previously mapped, and return to userspace, and userspace could execute that code, even on a different CPU, with no heavyweight serialization at all. I think the only practical way on Linux today to do this would be to create a memfd, use write(2) or similar to fill in the code, then mmap it executable. And to fight with LSMs to make sure they allow it and to maybe seal it as read-only before mmapping it. That latter bit kind of kills it if the goal is to write a web browser, though -- you don't really want a whole new memfd for each javascript block that gets JITted. Is any of this helpful? --Andy