Björn Töpel wrote: > On Tue, 28 Jan 2020 at 03:14, Palmer Dabbelt <palmerdabbelt@xxxxxxxxxx> wrote: > > > > There's four patches here, but only one of them actually does anything. The > > first patch fixes a BPF selftests build failure on my machine and has already > > been sent to the list separately. The next three are just staged such that > > there are some patches that avoid changing any functionality pulled out from > > the whole point of those refactorings, with two cleanups and then the idea. > > > > Maybe this is an odd thing to say in a cover letter, but I'm not actually sure > > this patch set is a good idea. The issue of extra moves after calls came up as > > I was reviewing some unrelated performance optimizations to the RISC-V BPF JIT. > > I figured I'd take a whack at performing the optimization in the context of the > > arm64 port just to get a breath of fresh air, and I'm not convinced I like the > > results. > > > > That said, I think I would accept something like this for the RISC-V port > > because we're already doing a multi-pass optimization for shrinking function > > addresses so it's not as much extra complexity over there. If we do that we > > should probably start puling some of this code into the shared BPF compiler, > > but we're also opening the doors to more complicated BPF JIT optimizations. > > Given that the BPF JIT appears to have been designed explicitly to be > > simple/fast as opposed to perform complex optimization, I'm not sure this is a > > sane way to move forward. > > > > Obviously I can only speak for myself and the RISC-V JIT, but given > that we already have opened the door for more advanced translations > (branch relaxation e.g.), I think that this makes sense. At the same > time we don't want to go all JVM on the JITs. :-P I'm not against it although if we start to go this route I would want some way to quantify how we are increasing/descreasing load times. > > > I figured I'd send the patch set out as more of a question than anything else. > > Specifically: > > > > * How should I go about measuring the performance of these sort of > > optimizations? I'd like to balance the time it takes to run the JIT with the > > time spent executing the program, but I don't have any feel for what real BPF > > programs look like or have any benchmark suite to run. Is there something > > out there this should be benchmarked against? (I'd also like to know that to > > run those benchmarks on the RISC-V port.) > > If you run the selftests 'test_progs' with -v it'll measure/print the > execution time of the programs. I'd say *most* BPF program invokes a > helper (via call). It would be interesting to see, for say the > selftests, how often the optimization can be performed. > > > * Is this the sort of thing that makes sense in a BPF JIT? I guess I've just > > realized I turned "review this patch" into a way bigger rabbit hole than I > > really want to go down... > > > > I'd say 'yes'. My hunch, and the workloads I've seen, BPF programs are > usually loaded, and then resident for a long time. So, the JIT time is > not super critical. The FB/Cilium folks can definitely provide a > better sample point, than my hunch. ;-) In our case the JIT time can be relevant because we are effectively holding up a kubernetes pod load waiting for programs to load. However, we can probably work-around it by doing more aggressive dynamic linking now that this is starting to land. It would be interesting to have a test to measure load time in selftests or selftests/benchmark/ perhaps. We have some of these out of tree we could push in I think if there is interest. > > > Björn