On Fri, Apr 14, 2023 at 01:41:52AM CEST, Alexei Starovoitov wrote: > On Thu, Apr 13, 2023 at 07:13:22PM +0200, Kumar Kartikeya Dwivedi wrote: > > On Wed, Apr 12, 2023 at 09:43:06PM CEST, Alexei Starovoitov wrote: > > > On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote: > > > > > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote: > > > > > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote: > > > > > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx, > > > > > > > > > > > > > > > > for (i = 0; i < nr_loops; i++) { > > > > > > > > ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0); > > > > > > > > + if (bpf_get_exception()) > > > > > > > > + return -EJUKEBOX; > > > > > > > > > > > > > > This is too slow. > > > > > > > We cannot afford a call and conditional here. > > > > > > > > > > > > There are two more options here: have two variants, one with and without the > > > > > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions > > > > > > which pass false/true) and dispatch to the appropriate one based on if callback > > > > > > throws or not (so the cost is not paid for current users at all). Secondly, we > > > > > > can avoid repeated calls by hoisting the call out and save the pointer to > > > > > > exception state, then it's a bit less costly. > > > > > > > > > > > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because > > > > > > > the overhead of indirect call was not acceptable. > > > > > > > After that we've added inlining of bpf_loop() to make overhead to the minimum. > > > > > > > With prog->aux->exception[] approach it might be ok-ish, > > > > > > > but my preference would be to disallow throw in callbacks. > > > > > > > timer cb, rbtree_add cb are typically small. > > > > > > > bpf_loop cb can be big, but we have open coded iterators now. > > > > > > > So disabling asserts in cb-s is probably acceptable trade-off. > > > > > > > > > > > > If the only reason to avoid them is the added performance cost, we can work > > > > > > towards eliminating that when bpf_throw is not used (see above). I agree that > > > > > > supporting it everywhere means thinking about a lot more corner cases, but I > > > > > > feel it would be less surprising if doing bpf_assert simply worked everywhere. > > > > > > One of the other reasons is that if it's being used within a shared static > > > > > > function that both main program and callbacks call into, it will be a bit > > > > > > annoying that it doesn't work in one context. > > > > > > > > > > I hope with open coded iterators the only use case for callbacks will be timers, > > > > > exception cb and rbtree_add. All three are special cases. > > > > > > > > There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain. > > > > > > > > > There is nothing to unwind in the timer case. > > > > > Certinaly not allowed to rethrow in exception cb. > > > > > less-like rbtree_add should be tiny. And it's gotta to be fast. > > > > > > > > I agree for some of the cases above they do not matter too much. The main one > > > > was bpf_loop, and the ones I listed (like for_each_map) seem to be those where > > > > people may do siginificant work in the callback. > > > > > > > > I think we can make it zero cost for programs that don't use it (even for the > > > > kernel code), I was just hoping to be able to support it everywhere as a generic > > > > helper to abort program execution without any special corner cases during usage. > > > > The other main point was about code sharing of functions which makes use of > > > > assertions. But I'm ok with dropping support for callbacks if you think it's not > > > > worth it in the end. > > > > > > I think the unexpected run-time slowdown due to checks is a bigger problem. > > > Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time > > > as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception. > > > > I 100% agree, slow down is a big problem and a downside of the current version. > > They need to be as lightweight as possible. It's too costly right now in this > > set. > > > > > The run-time matter more than ability to use assert in all conditions. I think > > > we'd need two flavors of bpf_assert() asm macros: with and without > > > bpf_throw(). They seem to be useful without actual throw. They will help the > > > verifier understand the ranges of variables in both cases. The macros are the > > > 99% of the feature. The actual throw mechanism is 1%. The unwinding and > > > release of resource is a cost of using macros without explicit control flow in > > > the bpf programs. The macros without throw might be good enough in many cases. > > > > Ok, I'll update to allow for both variants. But just to confirm, do you want to > > shelve automatic cleanup (part 2 for now), or want to add it? I'm not clear on > > whether you agree or disagree it's necessary. > > > > I'll try the prog->aux->exception route, but I must say that we're sort of > > trading off simpler semantics/behavior for speedup in that case (which is > > necessary, but it is a cost IMO). I'll post a version using that, but I'll also > > add comparisons with the variant where we spill ptr to exception state and > > load+jmp. It's an extra instruction but I will try to benchmark and see how much > > difference it causes in practice (probably over the XDP benchmark using such > > exceptions, since that's one of the most important performance-critical use > > cases). If you agree, let's double down on whatever approach we choose after > > analysing the difference? > > I think performance considerations dominate implementation and ease of use. > Could you describe how 'spill to exception state' will look like? It was about spilling pointer to exception state on entry to program (so just main subprog) at a known offset on stack, then instead of LD + JMP, LD from stack + LD + JMP where the check is needed. > I think the check after bpf_call insn has to be no more than LD + JMP. > I was thinking whether we can do static_key like patching of the code. > bpf_throw will know all locations that should be converted from nop into check > and will do text_poke_bp before throwing. > Maybe we can consider offline unwind and release too. The verifier will prep > release tables and throw will execute them. BPF progs always have frame pointers, > so walking the stack back is relatively easy. Release per callsite is hard. > After some thought, I think offline unwinding is the way to go. That means no rewrites for the existing code, and we just offload all the cost to the slow path (bpf_throw call) as it should be. There would be no cost at runtime (except the conditional branch, which should be well predicted). The current approach was a bit simpler so I gave it a shot first but I think it's not the way to go. I will rework the set. > As far as benchmarking I'd use selftests/bpf/bench. No need for real network XDP. Got it.