On 04/18/2017 01:04 AM, Alexei Starovoitov wrote:
On Mon, Apr 17, 2017 at 03:49:55PM -0400, David Miller wrote:From: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> Date: Sun, 16 Apr 2017 22:26:01 +0200The bpf tail-call use-case is a very good example of why the verifier cannot deduct the needed HEADROOM upfront.This brings up a very interesting question for me. I notice that tail calls are implemented by JITs largely by skipping over the prologue of that destination program. However, many JITs preload cached SKB values into fixed registers in the prologue. But they only do this if the program being JITed needs those values. So how can it work properly if a program that does not need the SKB values tail calls into one that does?For x86 JIT it's fine, since caching of skb values is not part of the prologue: emit_prologue(&prog); if (seen_ld_abs) emit_load_skb_data_hlen(&prog); and tail_call jumps into the next program as: EMIT4(0x48, 0x83, 0xC0, PROLOGUE_SIZE); /* add rax, prologue_size */ EMIT2(0xFF, 0xE0); /* jmp rax */ whereas inside emit_prologue() we have: B UILD_BUG_ON(cnt != PROLOGUE_SIZE); arm64 has similar proplogue skipping code and it's even simpler than x86, since it doesn't try to optimize LD_ABS/IND in assembler and instead calls into bpf_load_pointer() from generated code, so no caching of skb values at all. s390 jit has partial skipping of prologue, since bunch of registers are save/restored during tail_call and it looks fine to me as well.
And ppc64 does unwinding/tearing down the stack of the prog before jumping into the other program. Thus, no skipping of others prologue; looks fine, too.
It's very hard to extend test_bpf.ko with tail_calls, since maps need to be allocated and populated with file descriptors which are not feasible to do from .ko. Instead we need a user space based test for it. We've started building one in tools/testing/selftests/bpf/test_progs.c much more tests need to be added. Thorough testing of tail_calls is on the todo list.