Re: [PATCH bpf-next v3 1/8] bpf: Add generic attach/detach/query API for multi-progs

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sun, 9 Jul 2023 10:17:28 -0700

On Fri, Jul 07, 2023 at 07:24:48PM +0200, Daniel Borkmann wrote:
> +
> +#define BPF_MPROG_KEEP	0
> +#define BPF_MPROG_SWAP	1
> +#define BPF_MPROG_FREE	2

Please document how this is suppose to be used.
Patch 2 is using BPF_MPROG_FREE in tcx_entry_needs_release().
Where most of the code treats BPF_MPROG_SWAP and BPF_MPROG_FREE as equivalent.
I can guess what it's for, but a comment would help.

> +
> +#define BPF_MPROG_MAX	64

we've seen buggy user space attaching thousands of tc progs to the same netdev.
I suspect 64 limit will be hit much sooner than expected.

> +
> +#define bpf_mprog_foreach_tuple(entry, fp, cp, t)			\
> +	for (fp = &entry->fp_items[0], cp = &entry->parent->cp_items[0];\
> +	     ({								\
> +		t.prog = READ_ONCE(fp->prog);				\
> +		t.link = cp->link;					\
> +		t.prog;							\
> +	      });							\
> +	     fp++, cp++)
> +
> +#define bpf_mprog_foreach_prog(entry, fp, p)				\
> +	for (fp = &entry->fp_items[0];					\
> +	     (p = READ_ONCE(fp->prog));					\
> +	     fp++)

I have similar questions to Stanislav.
Looks like update/delete/query of bpf_prog should be protected by an external lock
(like RTNL in case of tcx),
but what are the life time rules for 'entry'?
Looking at patch 2 sch_handle_ingress():
struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress);
I suspect the assumption is that bpf_mprog_entry object should be accessed within
RCU critical section. Since tc/tcx and XDP run in napi we have RCU protection there.
In the future, for cgroups, bpf_prog_run_array_cg() will keep explicit rcu_read_lock()
before accessing bpf_mprog_entry, right?
And bpf_mprog_commit() assumes that RCU protection.
All fine, but we need to document that mprog mechanism is not suitable for sleepable progs.

> +	if (flags & BPF_F_BEFORE) {
> +		tidx = bpf_mprog_pos_before(entry, &rtuple);
> +		if (tidx < -1 || (idx >= -1 && tidx != idx)) {
> +			ret = tidx < -1 ? tidx : -EDOM;
> +			goto out;
> +		}
> +		idx = tidx;
> +	}
> +	if (flags & BPF_F_AFTER) {
> +		tidx = bpf_mprog_pos_after(entry, &rtuple);
> +		if (tidx < 0 || (idx >= -1 && tidx != idx)) {

tidx < 0 vs tidx < -1 for _after vs _before.
Does it have to have this subtle difference?
Can _after and _before have the same semantics for return value?

> +			ret = tidx < 0 ? tidx : -EDOM;
> +			goto out;
> +		}
> +		idx = tidx;
> +	}