Re: [RFC] Proposal: Static SECCOMP Policies

Maxwell Bland <mbland@xxxxxxxxxxxx> · Mon, 16 Sep 2024 17:17:54 -0500

Another long email follows. The TL;DR is considering the related issues
such as changes in cBPF and some interesting thoughts regarding Google's
maintenance of seccomp inside Android, Android maintainers should make
the decision to "use minijail" or "use bionic's tools" for compiling
policies to BPF. Is there any reason multiple seccomp policy to BPF
program compilers need to exist in the AOSP (or even, maybe, Linux's use
of seccomp)? The shift to a single project for policy compilation to BPF
would remove duplicate effort in maintaining seccomp policy to BPF
compilers, solve the code page integrity issue, and lower potential
sources of policy compiler errors. See below.

On Fri, Sep 13, 2024 at 02:16:40PM GMT, Maciej Żenczykowski wrote:
> On Fri, Sep 13, 2024 at 10:07 AM Maxwell Bland <mbland@xxxxxxxxxxxx> wrote:
> > Add a hook to seccomp which triggers/enables hooks in BPF's JIT to instrument
> > the output machine code  page so that EL2 can (1) invert the machine code back
> > to BPF then (2) check the BPF corresponds to a valid seccomp filter policy.
>
> If you care that deeply about this: you could simply turn of jit
> compilation of cBPF (including seccomp) - but you'll take a
> performance hit.
> If you care about performance you could only jit compile *recognized*
> cBPF programs.
> Hell, instead of jit-ing them you could replace them with outright
> (pre)compiled into the kernel native functions that accomplish the
> same thing.
> There's probably only somewhere <10 of these in common use / part of
> the platform.
> That said, you'd still pay a performance hit for (Chrome web browser
> style) sandboxes since those policies *will* be updated without os
> updates.  Similarly with the mainline shipped cBPF code (which does
> process all packets) - you can't guarantee it won't change.

I am hesistant with opting to turn off JIT, as a few months ago I got a
warning from Alexei Starovoitov about this approach:
https://lore.kernel.org/all/CAADnVQJCxFt2R=fbqx1T_03UioAsBO4UXYGh58kJaYHDpMHyxw@xxxxxxxxxxxxxx/

I would be hesitant for Moto (or anyone) to maintain a dynamic list of
acceptable code pages for each AOSP (or subpackage) release, and the
list will only grow with time. It would be really difficult, as well,
for me to even begin to figure out if I have "caught" all of them, since
Qualcomm services use seccomp and I have no idea if I am testing every
edge condition in the phone while developing this.

In lieu of knowing exactly what these code pages will be and the dangers
or growing lack of support for the BPF interpreter: the current
SYS_Seccomp user environment, e.g. libminijail or bionic's libc or
whatever Qualcomm is using, ends up being the de dacto specification of
the seccomp BPF "language", rather than a translation layer to a
standard policy file format which uniformly gets translated to BPF for
the kernel's consumption. The disconnect is that the current seccomp.c
semantics _only_ encode the cBPF operations and some sensibility
checking for the ranges of referenced memory, but seccomp.c is currently
not sufficient to provide an EL2-enforcable or Android-enforceable
contract on the integrity of the desired policy.

For example, I took some measurements today on-device, and the three
programs that were triggering EL2-level code page integrity failures in
the basic case follow the same general structure:

- Load systemcall _NR_ definition values
- Generate "priority" JEQ statements (opcode 0x15)
- Generate additional jump statements (opcode 0xa5, 0x35, etc)
- Standard(ish) suffix consisting of loads/movs/exits (opcode 0x61,0xb4,0x95)

But there's nothing to guarantee that this is what will happen in for
arbitrary programs with SYS_seccomp permission, as they could be using
different generators for their BPF. For example,
compile_seccomp_policy.py under the minijail project and genseccomp.py
under the bionic libc project solve this same problem in two different
ways, though they both generate a couple of _NR_ checks and jump
statements, but with different python code.

Can Android just say "use minijail" or "use bionic's tools" and call it
a day, similar to the intent system, or binder, or any number of the
ecosystem "hard rules"? That way, Google also does not have to maintain
the two separate projects doing the same thing, we can figure out what
the heck Qualcomm is doing, and I can sleep better at night. Seccomp is
not C, there's not the fight over clang vs gcc: system call numbers are
baked into struct seccomp_data, why bother with multiple (potentially
buggy and differing in flexibility) ways of compiling the desired policy
into BPF. Maybe this is too opinionated, but the nice world we would get
as a result is every single code page in Android's kernel would be
verifiable (and, if it was adopted in Linux generally) most ARM systems.

Regardless, the clear hack, to me, is that when EL2 gets a code page
integrity failure on one of these seccomp pages, for now I do some
simple binary analysis to check that the code page consists only of what
is effectively a giant case statement. Over time, this needs to be
refined to ensure the adversary has not mucked with the policy in a
valid way, like seccomp_check_filter in kernel/seccomp.c but better.

> I guess for the mainline shipped cBPF programs we could technically
> probably swap them for eBPF.  Taking a quick glance at uses of
> BpfClassic.h in aosp I see 6 socket filter cBPF programs, of which
> only 1 is dynamic (for matching clat IP addresses), so the remaining 5
> are probably trivial to eBPF-ify (and thus hide behind selinux
> restrictions).

clatd, netd, gpuWork, and others turned out to not be an issue (or I
have not run into any code page errors) yet, maybe because I'm running
drivers for the kernel protection at the book-ends of the kernel boot
process: one prior to any memory allocation so that it can ensure pages
get allocated in regions permissible for the Snapdragon chipset's
performance constraints on EL2 write checks, and the second after the
allocation of all boot-time kernel modules and BPF program loads, since
at that point I can check the allocated pages w.r.t. SHA256 hashes
computed (considering holes for self-patching and static_keys) at build
time using the .ko files, only because I am paranoid someone will
circumvent the existing verified boot routines.

As mentioned, I will work with Motorola see if I can figure out a
permissive license for the EL2 components for this part, especially
considering I have seen ... questionable promises ... regarding this
subject in my research and a apparent lack of acknowledgement of issues
like dynamic datastructures and seccomp filters from others (not Google)
promising hypervisor-enforced code integrity. Thankfully, due to GPL-2.0
the EL1 drivers will be open source. I will share them once they are
ready with testcases of existing exploits for page table modification,
code page modification, system control register modification, kworker
queue manipulation, BPF page manipulation, like the below:

#define MODIFY_KERNEL_CODE                                                     \
	do {                                                                   \
		fake_je = (struct jump_entry *)kallsyms_lookup_name_ind(       \
			"spectre_bhb_state");                                  \
		attack_addr = kallsyms_lookup_name_ind("udp_recvmsg");         \
		if (register_kprobe(&kp2)) {                                   \
			return -1;                                             \
		}                                                              \
		arch_jump_label_transform =                                    \
			(arch_jump_label_transform_t)kp2.addr;                 \
		fake_je->code = attack_addr - (unsigned long)&(fake_je->code); \
		fake_je->target = stext - (unsigned long)&(fake_je->target);   \
		arch_jump_label_transform(fake_je, JUMP_LABEL_JMP);            \
		return 0;                                                      \
	} while (0)