On Thu, 15 Feb 2024 at 23:11, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote: > > When we perform a bpf_throw kfunc call, callee saved registers in BPF > > calling convention (R6-R9) may end up getting saved and clobbered by > > bpf_throw. Typically, the kernel will restore the registers before > > returning back to the BPF program, but in case of bpf_throw, the > > function will never return. Therefore, any acquired resources sitting in > > these registers will end up getting destroyed if not saved on the > > stack, without any cleanup happening for them. > > Could you please paraphrase this description a bit? > It took me a while to figure out the difference between regular bpf > calls and kfunc calls. Something like: > > - For regular bpf subprogram calls jit emits code that pushes R6-R9 to stack > before jumping into callee. > - For kfunc calls jit emits instructions that do not guarantee that > R6-R9 would be preserved on stack. E.g. for x86 kfunc call is translated > as "call" instruction, which only pushes return address to stack. > Will rephrase like this, thanks for the suggestions. > -- > > Also, what do you think about the following hack: > - declare a hidden kfunc "bpf_throw_r(u64 r6, u64 r7, u64 r8, u64 r9)"; > - replace all calls to bpf_throw() with calls to bpf_throw_r() > (r1-r5 do not have to be preserved anyways). > Thus avoid necessity to introduce the trampoline. > I think we can do such a thing as well, but there are other tradeoffs. Do you mean that R6 to R9 would be copied to R1 to R5? We will have to special case such calls in each architecture's JIT, and add extra code to handle it, since fixups from the verifier would also need to pass the 6th argument, the cookie value to the bpf_throw call, which can't fit in the 5 argument limit for existing kfuncs. I did contemplate this solution but then decided against it for these reasons. One of the advantages of this bpf_throw_tramp stuff is that it does not increase code size for all callees, by doing the saving only when subprog is called. We can do something similar for bpf_throw_r, but it would be in architecture specific code in JIT or some arch_bpf_throw_r instead. Let me know if you suggested something different than what I understood above. > [...] > >