Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes: > On 11/14/19 1:31 PM, Toke Høiland-Jørgensen wrote: >> Björn Töpel <bjorn.topel@xxxxxxxxx> writes: >>> From: Björn Töpel <bjorn.topel@xxxxxxxxx> >>> >>> The BPF dispatcher builds on top of the BPF trampoline ideas; >>> Introduce bpf_arch_text_poke() and (re-)use the BPF JIT generate >>> code. The dispatcher builds a dispatch table for XDP programs, for >>> retpoline avoidance. The table is a simple binary search model, so >>> lookup is O(log n). Here, the dispatch table is limited to four >>> entries (for laziness reason -- only 1B relative jumps :-P). If the >>> dispatch table is full, it will fallback to the retpoline path. >> >> So it's O(log n) with n == 4? Have you compared the performance of just >> doing four linear compare-and-jumps? Seems to me it may not be that big >> of a difference for such a small N? > > Did you perform some microbenchmarks wrt search tree? Mainly wondering > since for code emission for switch/case statements, clang/gcc turns off > indirect calls entirely under retpoline, see [0] from back then. Yes, this was exactly the example I had in mind :) >>> An example: A module/driver allocates a dispatcher. The dispatcher is >>> shared for all netdevs. Each netdev allocate a slot in the dispatcher >>> and a BPF program. The netdev then uses the dispatcher to call the >>> correct program with a direct call (actually a tail-call). >> >> Is it really accurate to call it a tail call? To me, that would imply >> that it increments the tail call limit counter and all that? Isn't this >> just a direct jump using the trampoline stuff? > > Not meant in BPF context here, but more general [1]. Ah, right, that makes more sense. > (For actual BPF tail calls I have a series close to ready for getting > rid of most indirect calls which I'll post later today.) Cool! -Toke