On Tue, Jul 23, 2024 at 10:09 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > Discussed with Andrii. I think the following approach should work. > For each non-static prog, the private stack is allocated including > that non-static prog and the called static progs. For example, > main_prog > static_prog_1 > static_prog_11 > global_prog > static_prog_12 > static_prog_2 > > So in verifier we calculate stack size for > main_prog > static_prog_1 > static_prog_11 > static_prog_2 > and > global_prog > static_prog_12 > > Let us say the stack size for main_prog like below for each (sub)prog > main_prog // stack size 100 > static_prog_1 // stack size 100 > static_prog_11 // stack size 100 > static_prog_2 // static size 100 > so total static size is 300 so the private stack size will be 300. > So R9 is calculated like below > main_prog > R9 = ... // for tailcall reachable, R9 may be original R9 + offset > // for non-tailcall reachable, R9 equals the original R9 (based on jit-time allocation). > ... R9 ... > R9 += 100 > static_prog_1 > ... R9 ... > R9 += 100 > static_prog_11 > ... R9 ... > R9 -= 100 > R9 -= 100 > ... R9 ... > R9 += 100 > static_prog_2 > ... R9 ... > R9 -= 100 > > Similary, we can calculate R9 offset for > global_prog > static_prog_12 > as well. I don't understand why differentiate static and global surprogs. But, mainly, += and -= around the call is suboptimal. Can we do it as a normal stack does ? Each prog knows how much stack it needs, so it can do: r9 += stack_depth in the prologue and all accesses are done as r9 - off. Then to do a call nothing extra is needed. The callee will do r9 += its own stack depth. Whether private stack growth up or down is tbd. The challenge is how to supply proper r9 on entry into the main prog. Potentially a task for bpf trampoline, and kprobe/tp/etc will need special hack in bpf_dispatcher_nop_func.