Re: [PATCH bpf-next] bpf, xdp: add bpf_redirect{,_map}() leaf node detection and optimization

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 1 Dec 2020 20:46:38 -0800

On Tue, Dec 01, 2020 at 06:23:45PM +0100, Björn Töpel wrote:
> +static void check_redirect_opt(struct bpf_verifier_env *env, int func_id, int insn_idx)
> +{
> +	struct bpf_insn *insns = env->prog->insnsi;
> +	int insn_cnt = env->prog->len;
> +	struct bpf_insn *insn;
> +	bool is_leaf = false;
> +
> +	if (!(func_id == BPF_FUNC_redirect || func_id == BPF_FUNC_redirect_map))
> +		return;
> +
> +	/* Naive peephole leaf node checking */
> +	insn_idx++;
> +	if (insn_idx >= insn_cnt)
> +		return;
> +
> +	insn = &insns[insn_idx];
> +	switch (insn->code) {
> +	/* Is the instruction following the call, an exit? */
> +	case BPF_JMP | BPF_EXIT:
> +		is_leaf = true;
> +		break;
> +	/* Follow the true branch of "if return value (r/w0) is not
> +	 * zero", and look for exit.
> +	 */
> +	case BPF_JMP | BPF_JSGT | BPF_K:
> +	case BPF_JMP32 | BPF_JSGT | BPF_K:
> +	case BPF_JMP | BPF_JGT | BPF_K:
> +	case BPF_JMP32 | BPF_JGT | BPF_K:
> +	case BPF_JMP | BPF_JNE | BPF_K:
> +	case BPF_JMP32 | BPF_JNE | BPF_K:
> +		if (insn->dst_reg == BPF_REG_0 && insn->imm == 0) {
> +			insn_idx += insn->off + 1;
> +			if (insn_idx >= insn_cnt)
> +				break;
> +
> +			insn = &insns[insn_idx];
> +			is_leaf = insn->code == (BPF_JMP | BPF_EXIT);
> +		}
> +		break;
> +	default:
> +		break;
> +	}

Sorry I don't like this check at all. It's too fragile.
It will work for one hard coded program.
It may work for something more real, but will break with minimal
changes to the prog or llvm changes.
How are we going to explain that fragility to users?

> +static u64 __bpf_xdp_redirect_map_opt(struct bpf_map *map, u32 index, u64 flags,
> +				      struct bpf_redirect_info *ri)
> +{
> +	const struct bpf_prog *xdp_prog;
> +	struct net_device *dev;
> +	struct xdp_buff *xdp;
> +	void *val;
> +	int err;
> +
> +	xdp_prog = ri->xdp_prog_redirect_opt;
> +	xdp = ri->xdp;
> +	dev = xdp->rxq->dev;
> +
> +	ri->xdp_prog_redirect_opt = NULL;
> +
> +	switch (map->map_type) {
> +	case BPF_MAP_TYPE_DEVMAP: {
> +		val = __dev_map_lookup_elem(map, index);
> +		if (unlikely(!val))
> +			return flags;
> +		err = dev_map_enqueue(val, xdp, dev);
> +		break;
> +	}
> +	case BPF_MAP_TYPE_DEVMAP_HASH: {
> +		val = __dev_map_hash_lookup_elem(map, index);
> +		if (unlikely(!val))
> +			return flags;
> +		err = dev_map_enqueue(val, xdp, dev);
> +		break;
> +	}
> +	case BPF_MAP_TYPE_CPUMAP: {
> +		val = __cpu_map_lookup_elem(map, index);
> +		if (unlikely(!val))
> +			return flags;
> +		err = cpu_map_enqueue(val, xdp, dev);
> +		break;
> +	}
> +	case BPF_MAP_TYPE_XSKMAP: {
> +		val = __xsk_map_lookup_elem(map, index);
> +		if (unlikely(!val))
> +			return flags;
> +		err = __xsk_map_redirect(val, xdp);
> +		break;

I haven't looked through all possible paths, but it feels very dangerous.
The stack growth is big. Calling xsk_rcv from preempt_disabled
and recursively calling into another bpf prog?
That violates all stack checks we have in the verifier.

I see plenty of cons and not a single pro in this patch.
5% improvement for micro benchmark? That's hardly a justification.