Re: [PATCH bpf-next v2 1/5] bpf: Support injecting chain calls into BPF programs on load

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sun, 6 Oct 2019 17:27:40 -0700

On Fri, Oct 04, 2019 at 07:22:41PM +0200, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
> 
> This adds support for injecting chain call logic into eBPF programs before
> they return. The code injection is controlled by a flag at program load
> time; if the flag is set, the verifier will add code to every BPF_EXIT
> instruction that first does a lookup into a chain call structure to see if
> it should call into another program before returning. The actual calls
> reuse the tail call infrastructure.
> 
> Ideally, it shouldn't be necessary to set the flag on program load time,
> but rather inject the calls when a chain call program is first loaded.
> However, rewriting the program reallocates the bpf_prog struct, which is
> obviously not possible after the program has been attached to something.
> 
> One way around this could be a sysctl to force the flag one (for enforcing
> system-wide support). Another could be to have the chain call support
> itself built into the interpreter and JIT, which could conceivably be
> re-run each time we attach a new chain call program. This would also allow
> the JIT to inject direct calls to the next program instead of using the
> tail call infrastructure, which presumably would be a performance win. The
> drawback is, of course, that it would require modifying all the JITs.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
...
>  
> +static int bpf_inject_chain_calls(struct bpf_verifier_env *env)
> +{
> +	struct bpf_prog *prog = env->prog;
> +	struct bpf_insn *insn = prog->insnsi;
> +	int i, cnt, delta = 0, ret = -ENOMEM;
> +	const int insn_cnt = prog->len;
> +	struct bpf_array *prog_array;
> +	struct bpf_prog *new_prog;
> +	size_t array_size;
> +
> +	struct bpf_insn call_next[] = {
> +		BPF_LD_IMM64(BPF_REG_2, 0),
> +		/* Save real return value for later */
> +		BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
> +		/* First try tail call with index ret+1 */
> +		BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
> +		BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, 1),
> +		BPF_RAW_INSN(BPF_JMP | BPF_TAIL_CALL, 0, 0, 0, 0),
> +		/* If that doesn't work, try with index 0 (wildcard) */
> +		BPF_MOV64_IMM(BPF_REG_3, 0),
> +		BPF_RAW_INSN(BPF_JMP | BPF_TAIL_CALL, 0, 0, 0, 0),
> +		/* Restore saved return value and exit */
> +		BPF_MOV64_REG(BPF_REG_0, BPF_REG_6),
> +		BPF_EXIT_INSN()
> +	};

How did you test it?
With the only test from patch 5?
+int xdp_drop_prog(struct xdp_md *ctx)
+{
+       return XDP_DROP;
+}

Please try different program with more than one instruction.
And then look at above asm and think how it can be changed to
get valid R1 all the way to each bpf_exit insn.
Do you see amount of headaches this approach has?

The way you explained the use case of XDP-based firewall plus XDP-based
IPS/IDS it's about "knows nothing" admin that has to deal with more than
one XDP application on an unfamiliar server.
This is the case of debugging.
The admin would probably want to see all values xdp prog returns, right?
The possible answer is to add a tracepoint to bpf_prog_run_xdp().
Most drivers have XDP_DROP stats. So some visibility into drops
is already available.
Dumping all packets that xdp prog is dropping into user space via another
xdp application is imo pointless. User space won't be able to process
this rate of packets. Human admin won't be able to "grep" through millions
of packets either.
xdp-firewall prog is dropping the packets for some reason.
That reason is what admin is looking for!
The admin wants to see inside the program.
The actual content of the packet is like bread crumbs.
The authors of xdp firewall can find packet dumps useful,
but poor admin who was tasked to debug unknown xdp application will
not find anything useful in those packets.
I think what you're advocating for is better xdp debugging.
Let's work on that. Let's focus on designing good debugging facility.
This chaining feature isn't necessary for that and looks unlikely to converge.