Re: [PATCH RFC bpf-next v1 4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Sun, 16 Apr 2023 06:45:34 +0200

On Fri, Apr 14, 2023 at 01:41:52AM CEST, Alexei Starovoitov wrote:
> On Thu, Apr 13, 2023 at 07:13:22PM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Wed, Apr 12, 2023 at 09:43:06PM CEST, Alexei Starovoitov wrote:
> > > On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> > > > > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > > > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > > > > > >
> > > > > > > >  	for (i = 0; i < nr_loops; i++) {
> > > > > > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > > > > +		if (bpf_get_exception())
> > > > > > > > +			return -EJUKEBOX;
> > > > > > >
> > > > > > > This is too slow.
> > > > > > > We cannot afford a call and conditional here.
> > > > > >
> > > > > > There are two more options here: have two variants, one with and without the
> > > > > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > > > > > which pass false/true) and dispatch to the appropriate one based on if callback
> > > > > > throws or not (so the cost is not paid for current users at all). Secondly, we
> > > > > > can avoid repeated calls by hoisting the call out and save the pointer to
> > > > > > exception state, then it's a bit less costly.
> > > > > >
> > > > > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > > > > > the overhead of indirect call was not acceptable.
> > > > > > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > > > > > With prog->aux->exception[] approach it might be ok-ish,
> > > > > > > but my preference would be to disallow throw in callbacks.
> > > > > > > timer cb, rbtree_add cb are typically small.
> > > > > > > bpf_loop cb can be big, but we have open coded iterators now.
> > > > > > > So disabling asserts in cb-s is probably acceptable trade-off.
> > > > > >
> > > > > > If the only reason to avoid them is the added performance cost, we can work
> > > > > > towards eliminating that when bpf_throw is not used (see above). I agree that
> > > > > > supporting it everywhere means thinking about a lot more corner cases, but I
> > > > > > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > > > > > One of the other reasons is that if it's being used within a shared static
> > > > > > function that both main program and callbacks call into, it will be a bit
> > > > > > annoying that it doesn't work in one context.
> > > > >
> > > > > I hope with open coded iterators the only use case for callbacks will be timers,
> > > > > exception cb and rbtree_add. All three are special cases.
> > > >
> > > > There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.
> > > >
> > > > > There is nothing to unwind in the timer case.
> > > > > Certinaly not allowed to rethrow in exception cb.
> > > > > less-like rbtree_add should be tiny. And it's gotta to be fast.
> > > >
> > > > I agree for some of the cases above they do not matter too much. The main one
> > > > was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
> > > > people may do siginificant work in the callback.
> > > >
> > > > I think we can make it zero cost for programs that don't use it (even for the
> > > > kernel code), I was just hoping to be able to support it everywhere as a generic
> > > > helper to abort program execution without any special corner cases during usage.
> > > > The other main point was about code sharing of functions which makes use of
> > > > assertions. But I'm ok with dropping support for callbacks if you think it's not
> > > > worth it in the end.
> > >
> > > I think the unexpected run-time slowdown due to checks is a bigger problem.
> > > Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time
> > > as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception.
> >
> > I 100% agree, slow down is a big problem and a downside of the current version.
> > They need to be as lightweight as possible. It's too costly right now in this
> > set.
> >
> > > The run-time matter more than ability to use assert in all conditions. I think
> > > we'd need two flavors of bpf_assert() asm macros: with and without
> > > bpf_throw(). They seem to be useful without actual throw. They will help the
> > > verifier understand the ranges of variables in both cases. The macros are the
> > > 99% of the feature. The actual throw mechanism is 1%. The unwinding and
> > > release of resource is a cost of using macros without explicit control flow in
> > > the bpf programs. The macros without throw might be good enough in many cases.
> >
> > Ok, I'll update to allow for both variants. But just to confirm, do you want to
> > shelve automatic cleanup (part 2 for now), or want to add it? I'm not clear on
> > whether you agree or disagree it's necessary.
> >
> > I'll try the prog->aux->exception route, but I must say that we're sort of
> > trading off simpler semantics/behavior for speedup in that case (which is
> > necessary, but it is a cost IMO). I'll post a version using that, but I'll also
> > add comparisons with the variant where we spill ptr to exception state and
> > load+jmp. It's an extra instruction but I will try to benchmark and see how much
> > difference it causes in practice (probably over the XDP benchmark using such
> > exceptions, since that's one of the most important performance-critical use
> > cases). If you agree, let's double down on whatever approach we choose after
> > analysing the difference?
>
> I think performance considerations dominate implementation and ease of use.
> Could you describe how 'spill to exception state' will look like?

It was about spilling pointer to exception state on entry to program (so just
main subprog) at a known offset on stack, then instead of LD + JMP, LD from
stack + LD + JMP where the check is needed.

> I think the check after bpf_call insn has to be no more than LD + JMP.
> I was thinking whether we can do static_key like patching of the code.
> bpf_throw will know all locations that should be converted from nop into check
> and will do text_poke_bp before throwing.
> Maybe we can consider offline unwind and release too. The verifier will prep
> release tables and throw will execute them. BPF progs always have frame pointers,
> so walking the stack back is relatively easy. Release per callsite is hard.
>

After some thought, I think offline unwinding is the way to go. That means no
rewrites for the existing code, and we just offload all the cost to the slow
path (bpf_throw call) as it should be. There would be no cost at runtime (except
the conditional branch, which should be well predicted). The current approach
was a bit simpler so I gave it a shot first but I think it's not the way to go.
I will rework the set.

> As far as benchmarking I'd use selftests/bpf/bench. No need for real network XDP.

Got it.