On Mon, Sep 16, 2024 at 3:18 PM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote: > > Another long email follows. The TL;DR is considering the related issues > such as changes in cBPF and some interesting thoughts regarding Google's > maintenance of seccomp inside Android, Android maintainers should make > the decision to "use minijail" or "use bionic's tools" for compiling > policies to BPF. Is there any reason multiple seccomp policy to BPF > program compilers need to exist in the AOSP (or even, maybe, Linux's use > of seccomp)? The shift to a single project for policy compilation to BPF > would remove duplicate effort in maintaining seccomp policy to BPF > compilers, solve the code page integrity issue, and lower potential > sources of policy compiler errors. See below. > > On Fri, Sep 13, 2024 at 02:16:40PM GMT, Maciej Żenczykowski wrote: > > On Fri, Sep 13, 2024 at 10:07 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote: > > > Add a hook to seccomp which triggers/enables hooks in BPF's JIT to instrument > > > the output machine code page so that EL2 can (1) invert the machine code back > > > to BPF then (2) check the BPF corresponds to a valid seccomp filter policy. > > > > If you care that deeply about this: you could simply turn of jit > > compilation of cBPF (including seccomp) - but you'll take a > > performance hit. > > If you care about performance you could only jit compile *recognized* > > cBPF programs. > > Hell, instead of jit-ing them you could replace them with outright > > (pre)compiled into the kernel native functions that accomplish the > > same thing. > > There's probably only somewhere <10 of these in common use / part of > > the platform. > > That said, you'd still pay a performance hit for (Chrome web browser > > style) sandboxes since those policies *will* be updated without os > > updates. Similarly with the mainline shipped cBPF code (which does > > process all packets) - you can't guarantee it won't change. > > I am hesistant with opting to turn off JIT, as a few months ago I got a > warning from Alexei Starovoitov about this approach: > https://lore.kernel.org/all/CAADnVQJCxFt2R=fbqx1T_03UioAsBO4UXYGh58kJaYHDpMHyxw@xxxxxxxxxxxxxx/ > > I would be hesitant for Moto (or anyone) to maintain a dynamic list of > acceptable code pages for each AOSP (or subpackage) release, and the > list will only grow with time. It would be really difficult, as well, > for me to even begin to figure out if I have "caught" all of them, since > Qualcomm services use seccomp and I have no idea if I am testing every > edge condition in the phone while developing this. > > In lieu of knowing exactly what these code pages will be and the dangers > or growing lack of support for the BPF interpreter: the current > SYS_Seccomp user environment, e.g. libminijail or bionic's libc or > whatever Qualcomm is using, ends up being the de dacto specification of > the seccomp BPF "language", rather than a translation layer to a > standard policy file format which uniformly gets translated to BPF for > the kernel's consumption. The disconnect is that the current seccomp.c > semantics _only_ encode the cBPF operations and some sensibility > checking for the ranges of referenced memory, but seccomp.c is currently > not sufficient to provide an EL2-enforcable or Android-enforceable > contract on the integrity of the desired policy. > > For example, I took some measurements today on-device, and the three > programs that were triggering EL2-level code page integrity failures in > the basic case follow the same general structure: > > - Load systemcall _NR_ definition values > - Generate "priority" JEQ statements (opcode 0x15) > - Generate additional jump statements (opcode 0xa5, 0x35, etc) > - Standard(ish) suffix consisting of loads/movs/exits (opcode 0x61,0xb4,0x95) > > But there's nothing to guarantee that this is what will happen in for > arbitrary programs with SYS_seccomp permission, as they could be using > different generators for their BPF. For example, > compile_seccomp_policy.py under the minijail project and genseccomp.py > under the bionic libc project solve this same problem in two different > ways, though they both generate a couple of _NR_ checks and jump > statements, but with different python code. > > Can Android just say "use minijail" or "use bionic's tools" and call it > a day, similar to the intent system, or binder, or any number of the > ecosystem "hard rules"? That way, Google also does not have to maintain > the two separate projects doing the same thing, we can figure out what > the heck Qualcomm is doing, and I can sleep better at night. Seccomp is > not C, there's not the fight over clang vs gcc: system call numbers are > baked into struct seccomp_data, why bother with multiple (potentially > buggy and differing in flexibility) ways of compiling the desired policy > into BPF. Maybe this is too opinionated, but the nice world we would get > as a result is every single code page in Android's kernel would be > verifiable (and, if it was adopted in Linux generally) most ARM systems. > > Regardless, the clear hack, to me, is that when EL2 gets a code page > integrity failure on one of these seccomp pages, for now I do some > simple binary analysis to check that the code page consists only of what > is effectively a giant case statement. Over time, this needs to be > refined to ensure the adversary has not mucked with the policy in a > valid way, like seccomp_check_filter in kernel/seccomp.c but better. > > > I guess for the mainline shipped cBPF programs we could technically > > probably swap them for eBPF. Taking a quick glance at uses of > > BpfClassic.h in aosp I see 6 socket filter cBPF programs, of which > > only 1 is dynamic (for matching clat IP addresses), so the remaining 5 > > are probably trivial to eBPF-ify (and thus hide behind selinux > > restrictions). > > clatd, netd, gpuWork, and others turned out to not be an issue (or I > have not run into any code page errors) yet, maybe because I'm running > drivers for the kernel protection at the book-ends of the kernel boot > process: one prior to any memory allocation so that it can ensure pages > get allocated in regions permissible for the Snapdragon chipset's > performance constraints on EL2 write checks, and the second after the > allocation of all boot-time kernel modules and BPF program loads, since > at that point I can check the allocated pages w.r.t. SHA256 hashes > computed (considering holes for self-patching and static_keys) at build > time using the .ko files, only because I am paranoid someone will > circumvent the existing verified boot routines. > > As mentioned, I will work with Motorola see if I can figure out a > permissive license for the EL2 components for this part, especially > considering I have seen ... questionable promises ... regarding this > subject in my research and a apparent lack of acknowledgement of issues > like dynamic datastructures and seccomp filters from others (not Google) > promising hypervisor-enforced code integrity. Thankfully, due to GPL-2.0 > the EL1 drivers will be open source. I will share them once they are > ready with testcases of existing exploits for page table modification, > code page modification, system control register modification, kworker > queue manipulation, BPF page manipulation, like the below: > > #define MODIFY_KERNEL_CODE \ > do { \ > fake_je = (struct jump_entry *)kallsyms_lookup_name_ind( \ > "spectre_bhb_state"); \ > attack_addr = kallsyms_lookup_name_ind("udp_recvmsg"); \ > if (register_kprobe(&kp2)) { \ > return -1; \ > } \ > arch_jump_label_transform = \ > (arch_jump_label_transform_t)kp2.addr; \ > fake_je->code = attack_addr - (unsigned long)&(fake_je->code); \ > fake_je->target = stext - (unsigned long)&(fake_je->target); \ > arch_jump_label_transform(fake_je, JUMP_LABEL_JMP); \ > return 0; \ > } while (0) That's not valid cBPF -- Maciej Żenczykowski, Kernel Networking Developer @ Google