On Fri, 2024-02-16 at 22:59 +0100, Kumar Kartikeya Dwivedi wrote: [...] > > Also, what do you think about the following hack: > > - declare a hidden kfunc "bpf_throw_r(u64 r6, u64 r7, u64 r8, u64 r9)"; > > - replace all calls to bpf_throw() with calls to bpf_throw_r() > > (r1-r5 do not have to be preserved anyways). > > Thus avoid necessity to introduce the trampoline. > > > > I think we can do such a thing as well, but there are other tradeoffs. > > Do you mean that R6 to R9 would be copied to R1 to R5? We will have to > special case such calls in each architecture's JIT, and add extra code > to handle it, since fixups from the verifier would also need to pass > the 6th argument, the cookie value to the bpf_throw call, which can't > fit in the 5 argument limit for existing kfuncs. I did contemplate > this solution but then decided against it for these reasons. > > One of the advantages of this bpf_throw_tramp stuff is that it does > not increase code size for all callees, by doing the saving only when > subprog is called. We can do something similar for bpf_throw_r, but it > would be in architecture specific code in JIT or some arch_bpf_throw_r > instead. > > Let me know if you suggested something different than what I understood above. Forgot about cookie, however R6-R9 fit in R2-R5, so the cookie would be fine. arch_bpf_throw_r() that saves R6-R9 right after the call is probably better than plain bpf register copying. But you are correct that trampoline allows uniform processing in arch_bpf_cleanup_frame_resource(), so it would be less C code to implement this feature in the end.