On Tue, Jan 24, 2023 at 9:38 AM Yaniv Agman <yanivagman@xxxxxxxxx> wrote: > > בתאריך יום ג׳, 24 בינו׳ 2023 ב-19:24 מאת Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx>: > > > > On Tue, Jan 24, 2023 at 7:47 AM Yaniv Agman <yanivagman@xxxxxxxxx> wrote: > > > > > > בתאריך יום ג׳, 24 בינו׳ 2023 ב-14:30 מאת Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx>: > > > > > > > > On Mon, Jan 23, 2023 at 2:03 PM Yaniv Agman <yanivagman@xxxxxxxxx> wrote: > > > > > > > > > > בתאריך יום ב׳, 23 בינו׳ 2023 ב-23:25 מאת Jakub Sitnicki > > > > > <jakub@xxxxxxxxxxxxxx>: > > > > > > > > > > > > On Mon, Jan 23, 2023 at 11:01 PM +02, Yaniv Agman wrote: > > > > > > > בתאריך יום ב׳, 23 בינו׳ 2023 ב-22:06 מאת Martin KaFai Lau > > > > > > > <martin.lau@xxxxxxxxx>: > > > > > > >> > > > > > > >> On 1/23/23 9:32 AM, Yaniv Agman wrote: > > > > > > >> >>> interrupted the first one. But even then, I will need to find a way to > > > > > > >> >>> know if my program currently interrupts the run of another program - > > > > > > >> >>> is there a way to do that? > > > > > > >> May be a percpu atomic counter to see if the bpf prog has been re-entered on the > > > > > > >> same cpu. > > > > > > > > > > > > > > Not sure I understand how this will help. If I want to save local > > > > > > > program data on a percpu map and I see that the counter is bigger then > > > > > > > zero, should I ignore the event? > > > > > > > > > > > > map_update w/ BPF_F_LOCK disables preemption, if you're after updating > > > > > > an entry atomically. But it can't be used with PERCPU maps today. > > > > > > Perhaps that's needed now too. > > > > > > > > > > Yep. I think what is needed here is the ability to disable preemption > > > > > from the bpf program - maybe even adding a helper for that? > > > > > > > > I'm not sure what the issue is here. > > > > Old preempt_disable() doesn't mean that one bpf program won't ever > > > > be interrupted by another bpf prog. > > > > Like networking bpf prog in old preempt_disable can call into something > > > > where there is a kprobe and another tracing bpf prog will be called. > > > > The same can happen after we switched to migrate_disable. > > > > > > One difference here is that in what you describe the programmer can > > > know in advance which functions might call others and avoid that or > > > use other percpu maps, but if preemption can happen between functions > > > which are not related to one another (don't have a relation of caller > > > and callee), then the programmer can't have control over it > > > > Could you give a specific example? > > Sure. I can give two examples from places where we saw such corruption: > > 1. We use kprobes to trace some LSM hooks, e.g. security_file_open, > and a percpu scratch map to prepare the event for submit. When we also > added a TRACEPOINT to trace sched_process_free (where we also use this > scratch percpu map), the security_file_open events got corrupted and > we didn't know what was happening (was very hard to debug) bpf_lsm_file_open is sleepable. We never did preempt_disable there. kprobe on security_file_open works, but it's a bit of a hack. With preempt->migrate the delayed_put_task_struct->trace_sched_process_free can happen on the same cpu if prog gets preempted in preemtable kernel, but something is fishy here. Non-sleepable bpf progs always run in rcu cs while delayed_put_task_struct is call_rcu(), so if you've used a per-cpu scratch buffer for a duration of bpf prog (either old preempt_disable or new migrate_disable) the trace_sched_process_free won't be called until prog is finished. If you're using per-cpu scratch buffer to remember the data in one execution of kprobe-bpf and consuming in the next then all bets are off. This is going to break sooner or later. > 2. same was happening when kprobes were combined with cgroup_skb > programs to trace network events irqs can interrupt old preempt_disable, so depending on where kprobe-bpf is it can run while cgroup-bpf hasn't finished. Then there are differences in kernel configs: no preempt, full preempt, preempt_rcu and differences in rcu_tasks_trace implementation over the years. You can only assume that the validity of the pointer to bpf per-cpu array and bpf per-cpu hash, but no assumption about concurrent access to the same map if you share the same map across different programs. See BPF_F_LOCK earlier suggestion. It might be what you need.