Björn Töpel <bjorn.topel@xxxxxxxxx> writes: > From: Björn Töpel <bjorn.topel@xxxxxxxxx> > > The BPF dispatcher builds on top of the BPF trampoline ideas; > Introduce bpf_arch_text_poke() and (re-)use the BPF JIT generate > code. The dispatcher builds a dispatch table for XDP programs, for > retpoline avoidance. The table is a simple binary search model, so > lookup is O(log n). Here, the dispatch table is limited to four > entries (for laziness reason -- only 1B relative jumps :-P). If the > dispatch table is full, it will fallback to the retpoline path. So it's O(log n) with n == 4? Have you compared the performance of just doing four linear compare-and-jumps? Seems to me it may not be that big of a difference for such a small N? > An example: A module/driver allocates a dispatcher. The dispatcher is > shared for all netdevs. Each netdev allocate a slot in the dispatcher > and a BPF program. The netdev then uses the dispatcher to call the > correct program with a direct call (actually a tail-call). Is it really accurate to call it a tail call? To me, that would imply that it increments the tail call limit counter and all that? Isn't this just a direct jump using the trampoline stuff? -Toke