Another long email follows. The TL;DR is considering the related issues such as changes in cBPF and some interesting thoughts regarding Google's maintenance of seccomp inside Android, Android maintainers should make the decision to "use minijail" or "use bionic's tools" for compiling policies to BPF. Is there any reason multiple seccomp policy to BPF program compilers need to exist in the AOSP (or even, maybe, Linux's use of seccomp)? The shift to a single project for policy compilation to BPF would remove duplicate effort in maintaining seccomp policy to BPF compilers, solve the code page integrity issue, and lower potential sources of policy compiler errors. See below. On Fri, Sep 13, 2024 at 02:16:40PM GMT, Maciej Żenczykowski wrote: > On Fri, Sep 13, 2024 at 10:07 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote: > > Add a hook to seccomp which triggers/enables hooks in BPF's JIT to instrument > > the output machine code page so that EL2 can (1) invert the machine code back > > to BPF then (2) check the BPF corresponds to a valid seccomp filter policy. > > If you care that deeply about this: you could simply turn of jit > compilation of cBPF (including seccomp) - but you'll take a > performance hit. > If you care about performance you could only jit compile *recognized* > cBPF programs. > Hell, instead of jit-ing them you could replace them with outright > (pre)compiled into the kernel native functions that accomplish the > same thing. > There's probably only somewhere <10 of these in common use / part of > the platform. > That said, you'd still pay a performance hit for (Chrome web browser > style) sandboxes since those policies *will* be updated without os > updates. Similarly with the mainline shipped cBPF code (which does > process all packets) - you can't guarantee it won't change. I am hesistant with opting to turn off JIT, as a few months ago I got a warning from Alexei Starovoitov about this approach: https://lore.kernel.org/all/CAADnVQJCxFt2R=fbqx1T_03UioAsBO4UXYGh58kJaYHDpMHyxw@xxxxxxxxxxxxxx/ I would be hesitant for Moto (or anyone) to maintain a dynamic list of acceptable code pages for each AOSP (or subpackage) release, and the list will only grow with time. It would be really difficult, as well, for me to even begin to figure out if I have "caught" all of them, since Qualcomm services use seccomp and I have no idea if I am testing every edge condition in the phone while developing this. In lieu of knowing exactly what these code pages will be and the dangers or growing lack of support for the BPF interpreter: the current SYS_Seccomp user environment, e.g. libminijail or bionic's libc or whatever Qualcomm is using, ends up being the de dacto specification of the seccomp BPF "language", rather than a translation layer to a standard policy file format which uniformly gets translated to BPF for the kernel's consumption. The disconnect is that the current seccomp.c semantics _only_ encode the cBPF operations and some sensibility checking for the ranges of referenced memory, but seccomp.c is currently not sufficient to provide an EL2-enforcable or Android-enforceable contract on the integrity of the desired policy. For example, I took some measurements today on-device, and the three programs that were triggering EL2-level code page integrity failures in the basic case follow the same general structure: - Load systemcall _NR_ definition values - Generate "priority" JEQ statements (opcode 0x15) - Generate additional jump statements (opcode 0xa5, 0x35, etc) - Standard(ish) suffix consisting of loads/movs/exits (opcode 0x61,0xb4,0x95) But there's nothing to guarantee that this is what will happen in for arbitrary programs with SYS_seccomp permission, as they could be using different generators for their BPF. For example, compile_seccomp_policy.py under the minijail project and genseccomp.py under the bionic libc project solve this same problem in two different ways, though they both generate a couple of _NR_ checks and jump statements, but with different python code. Can Android just say "use minijail" or "use bionic's tools" and call it a day, similar to the intent system, or binder, or any number of the ecosystem "hard rules"? That way, Google also does not have to maintain the two separate projects doing the same thing, we can figure out what the heck Qualcomm is doing, and I can sleep better at night. Seccomp is not C, there's not the fight over clang vs gcc: system call numbers are baked into struct seccomp_data, why bother with multiple (potentially buggy and differing in flexibility) ways of compiling the desired policy into BPF. Maybe this is too opinionated, but the nice world we would get as a result is every single code page in Android's kernel would be verifiable (and, if it was adopted in Linux generally) most ARM systems. Regardless, the clear hack, to me, is that when EL2 gets a code page integrity failure on one of these seccomp pages, for now I do some simple binary analysis to check that the code page consists only of what is effectively a giant case statement. Over time, this needs to be refined to ensure the adversary has not mucked with the policy in a valid way, like seccomp_check_filter in kernel/seccomp.c but better. > I guess for the mainline shipped cBPF programs we could technically > probably swap them for eBPF. Taking a quick glance at uses of > BpfClassic.h in aosp I see 6 socket filter cBPF programs, of which > only 1 is dynamic (for matching clat IP addresses), so the remaining 5 > are probably trivial to eBPF-ify (and thus hide behind selinux > restrictions). clatd, netd, gpuWork, and others turned out to not be an issue (or I have not run into any code page errors) yet, maybe because I'm running drivers for the kernel protection at the book-ends of the kernel boot process: one prior to any memory allocation so that it can ensure pages get allocated in regions permissible for the Snapdragon chipset's performance constraints on EL2 write checks, and the second after the allocation of all boot-time kernel modules and BPF program loads, since at that point I can check the allocated pages w.r.t. SHA256 hashes computed (considering holes for self-patching and static_keys) at build time using the .ko files, only because I am paranoid someone will circumvent the existing verified boot routines. As mentioned, I will work with Motorola see if I can figure out a permissive license for the EL2 components for this part, especially considering I have seen ... questionable promises ... regarding this subject in my research and a apparent lack of acknowledgement of issues like dynamic datastructures and seccomp filters from others (not Google) promising hypervisor-enforced code integrity. Thankfully, due to GPL-2.0 the EL1 drivers will be open source. I will share them once they are ready with testcases of existing exploits for page table modification, code page modification, system control register modification, kworker queue manipulation, BPF page manipulation, like the below: #define MODIFY_KERNEL_CODE \ do { \ fake_je = (struct jump_entry *)kallsyms_lookup_name_ind( \ "spectre_bhb_state"); \ attack_addr = kallsyms_lookup_name_ind("udp_recvmsg"); \ if (register_kprobe(&kp2)) { \ return -1; \ } \ arch_jump_label_transform = \ (arch_jump_label_transform_t)kp2.addr; \ fake_je->code = attack_addr - (unsigned long)&(fake_je->code); \ fake_je->target = stext - (unsigned long)&(fake_je->target); \ arch_jump_label_transform(fake_je, JUMP_LABEL_JMP); \ return 0; \ } while (0)