Re: [RFC] Proposal: Static SECCOMP Policies

Maciej Żenczykowski <maze@xxxxxxxxxx> · Mon, 16 Sep 2024 15:50:04 -0700



On Mon, Sep 16, 2024 at 3:18 PM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
>
> Another long email follows. The TL;DR is considering the related issues
> such as changes in cBPF and some interesting thoughts regarding Google's
> maintenance of seccomp inside Android, Android maintainers should make
> the decision to "use minijail" or "use bionic's tools" for compiling
> policies to BPF. Is there any reason multiple seccomp policy to BPF
> program compilers need to exist in the AOSP (or even, maybe, Linux's use
> of seccomp)? The shift to a single project for policy compilation to BPF
> would remove duplicate effort in maintaining seccomp policy to BPF
> compilers, solve the code page integrity issue, and lower potential
> sources of policy compiler errors. See below.
>
> On Fri, Sep 13, 2024 at 02:16:40PM GMT, Maciej Żenczykowski wrote:
> > On Fri, Sep 13, 2024 at 10:07 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
> > > Add a hook to seccomp which triggers/enables hooks in BPF's JIT to instrument
> > > the output machine code  page so that EL2 can (1) invert the machine code back
> > > to BPF then (2) check the BPF corresponds to a valid seccomp filter policy.
> >
> > If you care that deeply about this: you could simply turn of jit
> > compilation of cBPF (including seccomp) - but you'll take a
> > performance hit.
> > If you care about performance you could only jit compile *recognized*
> > cBPF programs.
> > Hell, instead of jit-ing them you could replace them with outright
> > (pre)compiled into the kernel native functions that accomplish the
> > same thing.
> > There's probably only somewhere <10 of these in common use / part of
> > the platform.
> > That said, you'd still pay a performance hit for (Chrome web browser
> > style) sandboxes since those policies *will* be updated without os
> > updates.  Similarly with the mainline shipped cBPF code (which does
> > process all packets) - you can't guarantee it won't change.
>
> I am hesistant with opting to turn off JIT, as a few months ago I got a
> warning from Alexei Starovoitov about this approach:
> https://lore.kernel.org/all/CAADnVQJCxFt2R=fbqx1T_03UioAsBO4UXYGh58kJaYHDpMHyxw@xxxxxxxxxxxxxx/
>
> I would be hesitant for Moto (or anyone) to maintain a dynamic list of
> acceptable code pages for each AOSP (or subpackage) release, and the
> list will only grow with time. It would be really difficult, as well,
> for me to even begin to figure out if I have "caught" all of them, since
> Qualcomm services use seccomp and I have no idea if I am testing every
> edge condition in the phone while developing this.
>
> In lieu of knowing exactly what these code pages will be and the dangers
> or growing lack of support for the BPF interpreter: the current
> SYS_Seccomp user environment, e.g. libminijail or bionic's libc or
> whatever Qualcomm is using, ends up being the de dacto specification of
> the seccomp BPF "language", rather than a translation layer to a
> standard policy file format which uniformly gets translated to BPF for
> the kernel's consumption. The disconnect is that the current seccomp.c
> semantics _only_ encode the cBPF operations and some sensibility
> checking for the ranges of referenced memory, but seccomp.c is currently
> not sufficient to provide an EL2-enforcable or Android-enforceable
> contract on the integrity of the desired policy.
>
> For example, I took some measurements today on-device, and the three
> programs that were triggering EL2-level code page integrity failures in
> the basic case follow the same general structure:
>
> - Load systemcall _NR_ definition values
> - Generate "priority" JEQ statements (opcode 0x15)
> - Generate additional jump statements (opcode 0xa5, 0x35, etc)
> - Standard(ish) suffix consisting of loads/movs/exits (opcode 0x61,0xb4,0x95)
>
> But there's nothing to guarantee that this is what will happen in for
> arbitrary programs with SYS_seccomp permission, as they could be using
> different generators for their BPF. For example,
> compile_seccomp_policy.py under the minijail project and genseccomp.py
> under the bionic libc project solve this same problem in two different
> ways, though they both generate a couple of _NR_ checks and jump
> statements, but with different python code.
>
> Can Android just say "use minijail" or "use bionic's tools" and call it
> a day, similar to the intent system, or binder, or any number of the
> ecosystem "hard rules"? That way, Google also does not have to maintain
> the two separate projects doing the same thing, we can figure out what
> the heck Qualcomm is doing, and I can sleep better at night. Seccomp is
> not C, there's not the fight over clang vs gcc: system call numbers are
> baked into struct seccomp_data, why bother with multiple (potentially
> buggy and differing in flexibility) ways of compiling the desired policy
> into BPF. Maybe this is too opinionated, but the nice world we would get
> as a result is every single code page in Android's kernel would be
> verifiable (and, if it was adopted in Linux generally) most ARM systems.
>
> Regardless, the clear hack, to me, is that when EL2 gets a code page
> integrity failure on one of these seccomp pages, for now I do some
> simple binary analysis to check that the code page consists only of what
> is effectively a giant case statement. Over time, this needs to be
> refined to ensure the adversary has not mucked with the policy in a
> valid way, like seccomp_check_filter in kernel/seccomp.c but better.
>
> > I guess for the mainline shipped cBPF programs we could technically
> > probably swap them for eBPF.  Taking a quick glance at uses of
> > BpfClassic.h in aosp I see 6 socket filter cBPF programs, of which
> > only 1 is dynamic (for matching clat IP addresses), so the remaining 5
> > are probably trivial to eBPF-ify (and thus hide behind selinux
> > restrictions).
>
> clatd, netd, gpuWork, and others turned out to not be an issue (or I
> have not run into any code page errors) yet, maybe because I'm running
> drivers for the kernel protection at the book-ends of the kernel boot
> process: one prior to any memory allocation so that it can ensure pages
> get allocated in regions permissible for the Snapdragon chipset's
> performance constraints on EL2 write checks, and the second after the
> allocation of all boot-time kernel modules and BPF program loads, since
> at that point I can check the allocated pages w.r.t. SHA256 hashes
> computed (considering holes for self-patching and static_keys) at build
> time using the .ko files, only because I am paranoid someone will
> circumvent the existing verified boot routines.
>
> As mentioned, I will work with Motorola see if I can figure out a
> permissive license for the EL2 components for this part, especially
> considering I have seen ... questionable promises ... regarding this
> subject in my research and a apparent lack of acknowledgement of issues
> like dynamic datastructures and seccomp filters from others (not Google)
> promising hypervisor-enforced code integrity. Thankfully, due to GPL-2.0
> the EL1 drivers will be open source. I will share them once they are
> ready with testcases of existing exploits for page table modification,
> code page modification, system control register modification, kworker
> queue manipulation, BPF page manipulation, like the below:
>
> #define MODIFY_KERNEL_CODE                                                     \
>         do {                                                                   \
>                 fake_je = (struct jump_entry *)kallsyms_lookup_name_ind(       \
>                         "spectre_bhb_state");                                  \
>                 attack_addr = kallsyms_lookup_name_ind("udp_recvmsg");         \
>                 if (register_kprobe(&kp2)) {                                   \
>                         return -1;                                             \
>                 }                                                              \
>                 arch_jump_label_transform =                                    \
>                         (arch_jump_label_transform_t)kp2.addr;                 \
>                 fake_je->code = attack_addr - (unsigned long)&(fake_je->code); \
>                 fake_je->target = stext - (unsigned long)&(fake_je->target);   \
>                 arch_jump_label_transform(fake_je, JUMP_LABEL_JMP);            \
>                 return 0;                                                      \
>         } while (0)

That's not valid cBPF

--
Maciej Żenczykowski, Kernel Networking Developer @ Google