On Mon, Jan 30, 2023 at 8:01 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > On 1/30/23 1:12 PM, Ori Glassman wrote: > > Hi everyone, > > > > > > Since the patch to disable process migration instead of disabling preemption in eBPF programs, the latter are potentially susceptible to a reschedule mid-execution. Ori, security@kernel alias is not the place to ask bpf related questions. Yaniv, from your team, already brought it up here: https://lore.kernel.org/bpf/CAADnVQ++LzKt9Q-GtGXknVBqyMqY=vLJ3tR3NNGG3P66gvVCFQ@xxxxxxxxxxxxxx/ You cannot assume that different bpf progs attached to various events like tracepoints and lsm hooks won't overlap. It's a bug in your program. Nothing else. > Thanks for the report, Ori. We'll take a look and get back to you (also Cc'ing KP and Alexei > for visibility in here). > > > bpf_perf_event_output helper in an eBPF program, can trigger irq_work->arch_irq_work_raise, which will send an ipi to the current CPU, which may reschedule. > > > > By hooking a tracepoint such as sched_process_free that runs within an interrupt context, an execution flow (of a certain CPU) like the following may occur: > > - start execution of lsm/<some_func> and executing bpf_perf_event_output within the hook code > > - while executing bpf_perf_event_output, gets rescheduled and runs tp/sched_process_free > > - tp/sched_process_free returns and the CPU continues execution of lsm/<some_func> > > > > Using per-CPU data is a known issue in this kind of environment [1], this is also relevant for per-task local storage implemented in v5.11 [2]. > > > > This is risky in general but becomes particularly dangerous when used in LSM modules (such as apparmor), since the block/allow logic may rely on shared storage that may be manipulated mid-execution from the interrupt context and corrupt the decision of the module. > > A very un-harm example of such usage can be seen in bpf selftests [3]. > > > > This becomes a vulnerability if an LSM hook uses a pinned per-task local storage - which is not expected to get corrupted mid execution, since an attacker may load a program such as tp/sched_process_free, and bypass silently the LSM hook by corrupting the self-storage. The flaw is in the per-task local storage, which is not reliable. > > > > An example: > > --------------------------------------------------------------------------- > > // User loads the following program: > > SEC("lsm/bprm_creds_for_exec") > > int BPF_PROG(secure_exec, struct linux_binprm *bprm) > > { > > int *secureexec; > > secureexec = bpf_task_storage_get(&per_task_map, bpf_get_current_task_btf(), 0, BPF_LOCAL_STORAGE_GET_F_CREATE); // assume per_task_map is pinned > > > > if (secureexec && STR_EQUALS(bprm->filename, "some_virus")) // assume this condition is fulfilled > > *secureexec = 1; > > > > // secureexec is now 1 > > ... > > ... > > ... > > bpf_perf_event_output(); > > ... > > ... > > ... > > > > // secureexec is expected to be 1, but was changed from the interrupted context and is now 0 > > bpf_bprm_opts_set(bprm, *secureexec); > > > > // the binary "some_virus" will run without AT_SECURE > > > > return 0; > > } > > > > // The attacker code: > > SEC("raw_tracepoint/sched_process_free") > > int tracepoint__sched_sched_process_free(struct bpf_raw_tracepoint_args *ctx) > > { > > int *secureexec; > > secureexec = bpf_task_storage_get(&per_task_map, bpf_get_current_task_btf(), 0, BPF_LOCAL_STORAGE_GET_F_CREATE); // uses per_task_map pinned map that was defined in a different eBPF program > > if (secureexec) > > *secureexec = 0; // always turn off secureexec > > > > return 0; > > } These two programs access some task local storage. This code racy regardless of preempt_disable vs migrate_disable. bpf_task_storage_get() of the same task can run on different cpus. Whether trace_sched_process_free and security_bprm_creds_for_exec can happen on different cpus is kernel implementation detail. There looks to be another bug in the above: doing bpf_get_current_task_btf from raw_tracepoint/sched_process_free will return task_struct of the worker thread. I don't think it's the one you want. Anyway, please start a clean thread at bpf@vger with bpf questions. Don't spam security@kernel. --------------------------------------------------------------------------- > > > > Similarly, in cases where the LSM hook decides the return value (deny/allow) based on data from the local storage, this can also be altered by the attacker, resulting in doing a malicious action that the LSM module should normally block. > > > > > > I'd like to propose a tentative public disclosure date on 02/06/2023 12:00UTC. You must be kidding. > > > > > > [1] https://lwn.net/Articles/836503/ > > [2] https://github.com/torvalds/linux/commit/4cf1bc1f10452065a29d576fc5693fc4fab5b919 > > [3] https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/bpf/progs/bprm_opts.c#L24 > > > > > > Thanks, > > Ori > > >