Re: Are BPF programs preemptible?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



‫בתאריך יום ד׳, 25 בינו׳ 2023 ב-2:04 מאת ‪Alexei Starovoitov‬‏
<‪alexei.starovoitov@xxxxxxxxx‬‏>:‬
>
> On Tue, Jan 24, 2023 at 9:38 AM Yaniv Agman <yanivagman@xxxxxxxxx> wrote:
> >
> > ‫בתאריך יום ג׳, 24 בינו׳ 2023 ב-19:24 מאת ‪Alexei Starovoitov‬‏
> > <‪alexei.starovoitov@xxxxxxxxx‬‏>:‬
> > >
> > > On Tue, Jan 24, 2023 at 7:47 AM Yaniv Agman <yanivagman@xxxxxxxxx> wrote:
> > > >
> > > > ‫בתאריך יום ג׳, 24 בינו׳ 2023 ב-14:30 מאת ‪Alexei Starovoitov‬‏
> > > > <‪alexei.starovoitov@xxxxxxxxx‬‏>:‬
> > > > >
> > > > > On Mon, Jan 23, 2023 at 2:03 PM Yaniv Agman <yanivagman@xxxxxxxxx> wrote:
> > > > > >
> > > > > > ‫בתאריך יום ב׳, 23 בינו׳ 2023 ב-23:25 מאת ‪Jakub Sitnicki‬‏
> > > > > > <‪jakub@xxxxxxxxxxxxxx‬‏>:‬
> > > > > > >
> > > > > > > On Mon, Jan 23, 2023 at 11:01 PM +02, Yaniv Agman wrote:
> > > > > > > > ‫בתאריך יום ב׳, 23 בינו׳ 2023 ב-22:06 מאת ‪Martin KaFai Lau‬‏
> > > > > > > > <‪martin.lau@xxxxxxxxx‬‏>:‬
> > > > > > > >>
> > > > > > > >> On 1/23/23 9:32 AM, Yaniv Agman wrote:
> > > > > > > >> >>> interrupted the first one. But even then, I will need to find a way to
> > > > > > > >> >>> know if my program currently interrupts the run of another program -
> > > > > > > >> >>> is there a way to do that?
> > > > > > > >> May be a percpu atomic counter to see if the bpf prog has been re-entered on the
> > > > > > > >> same cpu.
> > > > > > > >
> > > > > > > > Not sure I understand how this will help. If I want to save local
> > > > > > > > program data on a percpu map and I see that the counter is bigger then
> > > > > > > > zero, should I ignore the event?
> > > > > > >
> > > > > > > map_update w/ BPF_F_LOCK disables preemption, if you're after updating
> > > > > > > an entry atomically. But it can't be used with PERCPU maps today.
> > > > > > > Perhaps that's needed now too.
> > > > > >
> > > > > > Yep. I think what is needed here is the ability to disable preemption
> > > > > > from the bpf program - maybe even adding a helper for that?
> > > > >
> > > > > I'm not sure what the issue is here.
> > > > > Old preempt_disable() doesn't mean that one bpf program won't ever
> > > > > be interrupted by another bpf prog.
> > > > > Like networking bpf prog in old preempt_disable can call into something
> > > > > where there is a kprobe and another tracing bpf prog will be called.
> > > > > The same can happen after we switched to migrate_disable.
> > > >
> > > > One difference here is that in what you describe the programmer can
> > > > know in advance which functions might call others and avoid that or
> > > > use other percpu maps, but if preemption can happen between functions
> > > > which are not related to one another (don't have a relation of caller
> > > > and callee), then the programmer can't have control over it
> > >
> > > Could you give a specific example?
> >
> > Sure. I can give two examples from places where we saw such corruption:
> >
> > 1. We use kprobes to trace some LSM hooks, e.g. security_file_open,
> > and a percpu scratch map to prepare the event for submit. When we also
> > added a TRACEPOINT to trace sched_process_free (where we also use this
> > scratch percpu map), the security_file_open events got corrupted and
> > we didn't know what was happening (was very hard to debug)
>
> bpf_lsm_file_open is sleepable.
> We never did preempt_disable there.
> kprobe on security_file_open works, but it's a bit of a hack.
> With preempt->migrate
> the delayed_put_task_struct->trace_sched_process_free can happen
> on the same cpu if prog gets preempted in preemtable kernel, but something
> is fishy here.
> Non-sleepable bpf progs always run in rcu cs while
> delayed_put_task_struct is call_rcu(),
> so if you've used a per-cpu scratch buffer for a duration
> of bpf prog (either old preempt_disable or new migrate_disable)
> the trace_sched_process_free won't be called until prog is finished.
>
> If you're using per-cpu scratch buffer to remember the data in one
> execution of kprobe-bpf and consuming in the next then all bets are off.
> This is going to break sooner or later.
>

Yes, I agree this is fishy, but we only use the per-cpu scratch map
for the duration of a bpf program, assuming tail-calls are considered
to be part of the same program.
We've even spotted the place where preemption happens to be right
after calling bpf_perf_event_submit() helper by placing bpf_printk in
the security_file_open kprobe. Before the call to the helper the
buffer was ok, but after it got corrupted (and there were more lines
in the program after the call to this helper).

> > 2. same was happening when kprobes were combined with cgroup_skb
> > programs to trace network events
>
> irqs can interrupt old preempt_disable, so depending on where kprobe-bpf
> is it can run while cgroup-bpf hasn't finished.
>

Actually what we saw here is that the ingress skb program that runs
from ksoftirq/idle context corrupts a percpu map shared with a
raw_sys_exit tracepoint used to submit execve/openat events.

> Then there are differences in kernel configs: no preempt, full preempt,
> preempt_rcu and differences in rcu_tasks_trace implementation
> over the years. You can only assume that the validity of the pointer
> to bpf per-cpu array and bpf per-cpu hash, but no assumption
> about concurrent access to the same map if you share the same map
> across different programs.

I think that is the main point here. Thanks for clarifying this.

> See BPF_F_LOCK earlier suggestion. It might be what you need.

I am not sure how exactly this can help. The size of the percpu buffer
we use is 32kb and we don't use the map_update helper but the pointer
returned from the lookup to update the map in several places in the
program.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux