Re: [PATCH bpf-next v2] Add notrace to queued_spin_lock_slowpath

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 24 Apr 2024 11:45:51 -0700

On Tue, Apr 23, 2024 at 4:58 PM Siddharth Chintamaneni
<sidchintamaneni@xxxxxxxxx> wrote:
>
> > > I agree with the point that notracing all the functions will not
> > > resolve the issue. I could also find a scenario where BPF programs
> > > will end up in a deadlock easily by using bpf_map_pop_elem and
> > > bpf_map_push_elem helper functions called from two different BPF
> > > programs accessing the same map. Here are some issues raised by syzbot
> > > [2, 3].
> >
> > ringbuf and stackqueue maps should probably be fixed now
> > similar to hashmap's __this_cpu_inc_return(*(htab->map_locked...)
> > approach.
> > Both ringbug and queue_stack can handle failure to lock.
> > That will address the issue spotted by these 2 syzbot reports.
> > Could you work on such patches?
>
> Just seen your latest patches related to this. Yes, I will work on the
> fixes and send the patches.

Great.

> > > In those cases, the user assumes that these BPF programs will always
> > > trigger. So, to address these types of issues, we are currently
> > > working on a helper's function callgraph based approach so that the
> > > verifier gets the ability to make a decision during load time on
> > > whether to load it or not, ensuring that if a BPF program is attached,
> > > it will be triggered.
> >
> > callgraph approach? Could you share more details?
>
> We are generating a call graph for all the helper functions (including
> the indirect call targets) and trying to filter out the functions and
> their callee's which take locks. So if any BPF program tries to attach
> to these lock-taking functions and contains these lock-taking
> functions inside the helper it is calling, we want to reject it at
> load time. This type of approach may lead to many false positives, but
> it will adhere to the principle that "If a BPF program is attached,
> then it should get triggered as expected without affecting the
> kernel." We are planning to work towards this solution and would love
> your feedback on it.

I can see how you can build a cfg across bpf progs and helpers/kfuncs
that they call, but not across arbitrary attach points in the kernel.
So attaching to qspinlock_slowpath or some tracepoint won't be
recognized in such callgraph. Or am I missing it?

In the past we floated an idea to dual compile the kernel.
Once for native and once for bpf isa, so that all of the kernel code
is analyzable. But there was no strong enough use case to do it.

> > Not following. The stack overflow issue is being fixed by not using
> > the kernel stack. So each bpf prog will consume a tiny bit of stack
> > to save frame pointer, return address, and callee regs.
>
> (IIRC), in the email chain, it is mentioned that BPF programs are
> going to use a new stack allocated from the heap. I think with a
> deeper call chain of fentry BPF programs, isn't it still a possibility
> to overflow the stack?

stack overflow is a possibility even without bpf. That's why the stack
is now virtually mapped with guard pages.

> Also, through our call graph analysis, we found
> that some helpers have really deeper call depths. If a BPF program is
> attached at the deepest point in the helper's call chain, isn't there
> still a possibility to overflow it? In LPC '23 [1], we presented a
> similar idea of stack switching to prevent the overflow in nesting and
> later realized that it may give you some extra space and the ability
> to nest more, but it is not entirely resolving the issue (tail calls
> will even worsen this issue).

The problem with any kind of stack switching is reliability of stack
traces. Kernel panics must be able to walk the stack. Even when
there are bugs. Full stack switch makes it risky.
Then livepatching depends on reliable stack walks.
That's another reason to avoid stack switch.

>
> [1] https://lpc.events/event/17/contributions/1595/attachments/1230/2506/LPC2023_slides.pdf