Re: Interruptable eBPF programs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





From: Ori Glassman <ori.glassman@xxxxxxxxxxx>
Sent: Monday, January 30, 2023 8:45 PM
To: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>; Daniel Borkmann <daniel@xxxxxxxxxxxxx>; bpf <bpf@xxxxxxxxxxxxxxx>
Cc: KP Singh <kpsingh@xxxxxxxxxxxx>
Subject: Re: Interruptable eBPF programs 
 
> From: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> 
> Sent: Monday, January 30, 2023 6:42 PM
> To: Daniel Borkmann <daniel@xxxxxxxxxxxxx>; bpf <bpf@xxxxxxxxxxxxxxx>
> Cc: Ori Glassman <ori.glassman@xxxxxxxxxxx>; security@xxxxxxxxxx <security@xxxxxxxxxx>; KP Singh <kpsingh@xxxxxxxxxxxx>; Alexei Starovoitov <ast@xxxxxxxxxx>
> Subject: Re: Interruptable eBPF programs
 
> On Mon, Jan 30, 2023 at 8:01 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
> >
> > On 1/30/23 1:12 PM, Ori Glassman wrote:
> > > Hi everyone,
> > >
> > >
> > > Since the patch to disable process migration instead of disabling preemption in eBPF programs, the latter are potentially susceptible to a reschedule mid-execution.


> Ori,

> security@kernel alias is not the place to ask bpf related questions.
I apologise if it was confusing, I wasn't asking a question - rather raising a security concern of mine.

> Yaniv, from your team, already brought it up here:
> https://lore.kernel.org/bpf/CAADnVQ++LzKt9Q-GtGXknVBqyMqY=vLJ3tR3NNGG3P66gvVCFQ@xxxxxxxxxxxxxx/

> You cannot assume that different bpf progs attached to various
> events like tracepoints and lsm hooks won't overlap.
> It's a bug in your program. Nothing else.
How can one use bpf_task_storage_get() without the risk of getting corrupted? Say there's a module that consists of 1 simple program, a single LSM hook on bprm_creds_for_exec, that uses a local task storage pinned map. 
An attacker at some point in the future loads his tracepoint program, and maliciously corrupts the local storage *while* the LSM hook is executing, not after or before. What's the bug in the program that consists of the single LSM hook?

> > Thanks for the report, Ori. We'll take a look and get back to you (also Cc'ing KP and Alexei
> > for visibility in here).
> >
> > > bpf_perf_event_output helper in an eBPF program, can trigger irq_work->arch_irq_work_raise, which will send an ipi to the current CPU, which may reschedule.
> > >
> > > By hooking a tracepoint such as sched_process_free that runs within an interrupt context, an execution flow (of a certain CPU) like the following may occur:
> > > - start execution of lsm/<some_func> and executing bpf_perf_event_output within the hook code
> > > - while executing bpf_perf_event_output, gets rescheduled and runs tp/sched_process_free
> > > - tp/sched_process_free returns and the CPU continues execution of lsm/<some_func>
> > >
> > > Using per-CPU data is a known issue in this kind of environment [1], this is also relevant for per-task local storage implemented in v5.11 [2].
> > >
> > > This is risky in general but becomes particularly dangerous when used in LSM modules (such as apparmor), since the block/allow logic may rely on shared storage that may be manipulated mid-execution from the interrupt context and corrupt the decision of the module.
> > > A very un-harm example of such usage can be seen in bpf selftests [3].
> > >
> > > This becomes a vulnerability if an LSM hook uses a pinned per-task local storage - which is not expected to get corrupted mid execution, since an attacker may load a program such as tp/sched_process_free, and bypass silently the LSM hook by corrupting the self-storage. The flaw is in the per-task local storage, which is not reliable.
> > >
> > > An example:
> > > ---------------------------------------------------------------------------
> > > // User loads the following program:
> > > SEC("lsm/bprm_creds_for_exec")
> > > int BPF_PROG(secure_exec, struct linux_binprm *bprm)
> > > {
> > >      int *secureexec;
> > >      secureexec = bpf_task_storage_get(&per_task_map, bpf_get_current_task_btf(), 0, BPF_LOCAL_STORAGE_GET_F_CREATE); // assume per_task_map is pinned
> > >
> > >      if (secureexec && STR_EQUALS(bprm->filename, "some_virus")) // assume this condition is fulfilled
> > >          *secureexec = 1;
> > >
> > > // secureexec is now 1
> > >      ...
> > >      ...
> > >      ...
> > >      bpf_perf_event_output();
> > >      ...
> > >      ...
> > >      ...
> > >
> > >      // secureexec is expected to be 1, but was changed from the interrupted context and is now 0
> > >      bpf_bprm_opts_set(bprm, *secureexec);
> > >
> > >      // the binary "some_virus" will run without AT_SECURE
> > >
> > >      return 0;
> > > }
> > >
> > > // The attacker code:
> > > SEC("raw_tracepoint/sched_process_free")
> > > int tracepoint__sched_sched_process_free(struct bpf_raw_tracepoint_args *ctx)
> > > {
> > >      int *secureexec;
> > >      secureexec = bpf_task_storage_get(&per_task_map, bpf_get_current_task_btf(), 0, BPF_LOCAL_STORAGE_GET_F_CREATE); // uses per_task_map pinned map that was defined in a different eBPF program
> > >      if (secureexec)
> > >          *secureexec = 0; // always turn off secureexec
> > >
> > >      return 0;
> > > }

> These two programs access some task local storage.
I'm talking specifically when the programs are executed by the same task and thus accessing the same local storage.
> This code racy regardless of preempt_disable vs migrate_disable.
> bpf_task_storage_get() of the same task can run on different cpus.
Not at the same time though, right? I'm not concerned about the cases where the map is used in multiple programs - I'm concerned about the cases where it's used locally in a single program, but gets corrupted in a timely manner from the outside by an attacker.


> Whether trace_sched_process_free and security_bprm_creds_for_exec
> can happen on different cpus is kernel implementation detail.

> There looks to be another bug in the above:
> doing bpf_get_current_task_btf from raw_tracepoint/sched_process_free
> will return task_struct of the worker thread.
> I don't think it's the one you want.
That's not what I observed - this is the output of bpf_trace_printk where the execution of the LSM hook got interrupted mid-execution:
----
 chrony-onofflin-12460   [000] d.s.1  2258.804195: bpf_trace_printk: EXECUTION HIJACK(b=2257261931167) # this is from tp/sched_process_free
 chrony-onofflin-12460   [000] d...1  2258.804234: bpf_trace_printk: a=2257261896666. c=2257261971220 # from the lsm hook
-----


> Anyway, please start a clean thread at bpf@vger with bpf questions.
> Don't spam security@kernel.

> ---------------------------------------------------------------------------
> > >
> > > Similarly, in cases where the LSM hook decides the return value (deny/allow) based on data from the local storage, this can also be altered by the attacker, resulting in doing a malicious action that the LSM module should normally block.
> > >
> > >
> > > I'd like to propose a tentative public disclosure date on 02/06/2023 12:00UTC.

> You must be kidding.

> > >
> > >
> > > [1] https://lwn.net/Articles/836503/
> > > [2] https://github.com/torvalds/linux/commit/4cf1bc1f10452065a29d576fc5693fc4fab5b919
> > > [3] https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/bpf/progs/bprm_opts.c#L24
> > >
> > >
> > > Thanks,
> > > Ori
> > >
> >



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux