On Sat, May 29, 2021 at 9:49 AM Grant Seltzer Richman <grantseltzer@xxxxxxxxx> wrote: > > Hi all, > > I'm trying to reduce stack usage in my bpf program. I moved over to > using `bpf_core_read()` instead of `bpf_probe_read()` and it appears > to have made my program exceed the 512 byte stack limit. bpf_core_read() is almost identical to bpf_probe_read() except it might generated extra register assignment due to CO-RE relocation, which in turn might cause stack spill due to register use, etc. So my advice would be to try to simplify your code and split it into sub-programs, easing the stack spill pressure for compiler. But the link to example code would probably a good way to get more actionable feedback. > > Are there any profiler tools or compiler flags I can use to figure out > what is exactly using up the most memory? llvm-objdump -d <bpf.o> and see what stores at big negative offsets relative to r10 (which is a frame pointer)? > > Additionally, does anyone have good examples they can point me to of > storing structures in per_cpu maps or local storage mechanisms? selftests, as always (don't know about "good examples", but examples nevertheless). See progs/profiler.inc.h in particular and its use of per-CPU array for poor man's heap implementation (data_heap map). > > Thanks so much! > Grant