Re: [Potential Spoof] [PATCH bpf-next 3/5] bpf: Support replacing cgroup-bpf program in MULTI mode

Martin Lau <kafai@xxxxxx> · Thu, 12 Dec 2019 18:18:00 +0000

On Tue, Dec 10, 2019 at 06:33:29PM -0800, Andrey Ignatov wrote:
> The common use-case in production is to have multiple cgroup-bpf
> programs per attach type that cover multiple use-cases. Such programs
> are attached with BPF_F_ALLOW_MULTI and can be maintained by different
> people.
> 
> Order of programs usually matters, for example imagine two egress
> programs: the first one drops packets and the second one counts packets.
> If they're swapped the result of counting program will be different.
> 
> It brings operational challenges with updating cgroup-bpf program(s)
> attached with BPF_F_ALLOW_MULTI since there is no way to replace a
> program:
> 
> * One way to update is to detach all programs first and then attach the
>   new version(s) again in the right order. This introduces an
>   interruption in the work a program is doing and may not be acceptable
>   (e.g. if it's egress firewall);
> 
> * Another way is attach the new version of a program first and only then
>   detach the old version. This introduces the time interval when two
>   versions of same program are working, what may not be acceptable if a
>   program is not idempotent. It also imposes additional burden on
>   program developers to make sure that two versions of their program can
>   co-exist.
> 
> Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH
> command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI
> flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag
> and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to
> replace
> 
> That way user can replace any program among those attached with
> BPF_F_ALLOW_MULTI flag without the problems described above.
> 
> Details of the new API:
> 
> * If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid
>   descriptor of BPF program, BPF_PROG_ATTACH will return corresponding
>   error (EINVAL or EBADF).
> 
> * If replace_bpf_fd has valid descriptor of BPF program but such a
>   program is not attached to specified cgroup, BPF_PROG_ATTACH will
>   return ENOENT.
> 
> BPF_F_REPLACE is introduced to make the user intend clear, since
> replace_bpf_fd alone can't be used for this (its default value, 0, is a
> valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
> future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).
Thanks for the details explanation.

[ ... ]

> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 283efe3ce052..45346c79613a 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -282,14 +282,17 @@ static int update_effective_progs(struct cgroup *cgrp,
>   *                         propagate the change to descendants
>   * @cgrp: The cgroup which descendants to traverse
>   * @prog: A program to attach
> + * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
>   * @type: Type of attach operation
>   * @flags: Option flags
>   *
>   * Must be called with cgroup_mutex held.
>   */
>  int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
> +			struct bpf_prog *replace_prog,
>  			enum bpf_attach_type type, u32 flags)
>  {
> +	u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
>  	struct list_head *progs = &cgrp->bpf.progs[type];
>  	struct bpf_prog *old_prog = NULL;
>  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
> @@ -298,14 +301,15 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	enum bpf_cgroup_storage_type stype;
>  	int err;
>  
> -	if ((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI))
> +	if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) ||
> +	    ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
>  		/* invalid combination */
>  		return -EINVAL;
>  
>  	if (!hierarchy_allows_attach(cgrp, type))
>  		return -EPERM;
>  
> -	if (!list_empty(progs) && cgrp->bpf.flags[type] != flags)
> +	if (!list_empty(progs) && cgrp->bpf.flags[type] != saved_flags)
>  		/* Disallow attaching non-overridable on top
>  		 * of existing overridable in this cgroup.
>  		 * Disallow attaching multi-prog if overridable or none
> @@ -320,7 +324,12 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
>  			if (pl->prog == prog)
>  				/* disallow attaching the same prog twice */
>  				return -EINVAL;
> +			if (pl->prog == replace_prog)
> +				replace_pl = pl;
>  		}
> +		if ((flags & BPF_F_REPLACE) && !replace_pl)
> +			/* prog to replace not found for cgroup */
> +			return -ENOENT;
>  	} else if (!list_empty(progs)) {
>  		replace_pl = list_first_entry(progs, typeof(*pl), node);
>  	}
> @@ -356,7 +365,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	for_each_cgroup_storage_type(stype)
>  		pl->storage[stype] = storage[stype];
>  
> -	cgrp->bpf.flags[type] = flags;
> +	cgrp->bpf.flags[type] = saved_flags;
>  
>  	err = update_effective_progs(cgrp, type);
>  	if (err)
> @@ -522,6 +531,7 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  int cgroup_bpf_prog_attach(const union bpf_attr *attr,
>  			   enum bpf_prog_type ptype, struct bpf_prog *prog)
>  {
> +	struct bpf_prog *replace_prog = NULL;
>  	struct cgroup *cgrp;
>  	int ret;
>  
> @@ -529,8 +539,20 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr,
>  	if (IS_ERR(cgrp))
>  		return PTR_ERR(cgrp);
>  
> -	ret = cgroup_bpf_attach(cgrp, prog, attr->attach_type,
> +	if ((attr->attach_flags & BPF_F_ALLOW_MULTI) &&
> +	    (attr->attach_flags & BPF_F_REPLACE)) {
The patch looks good.  One optional nit for consideration,

Since it is testing BPF_F_REPLACE here already,
how about moving the
"((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI))"
test from __cgroup_bpf_attach() to this function here?
Clear the BPF_F_REPLACE bit before passing to cgroup_bpf_attach().

Then the "saved_flags" logic in cgroup_bpf_attach() can go away.
cgroup_bpf_attach() can work on the "replace_prog" alone.

Acked-by: Martin KaFai Lau <kafai@xxxxxx>

> +		replace_prog = bpf_prog_get_type(attr->replace_bpf_fd, ptype);
> +		if (IS_ERR(replace_prog)) {
> +			cgroup_put(cgrp);
> +			return PTR_ERR(replace_prog);
> +		}
> +	}
> +
> +	ret = cgroup_bpf_attach(cgrp, prog, replace_prog, attr->attach_type,
>  				attr->attach_flags);
> +
> +	if (replace_prog)
> +		bpf_prog_put(replace_prog);
>  	cgroup_put(cgrp);
>  	return ret;
>  }