Re: [PATCH v5 bpf-next 3/5] bpf: Make cgroup storages shared between programs on the same cgroup

Martin KaFai Lau <kafai@xxxxxx> · Thu, 23 Jul 2020 17:00:19 -0700



On Thu, Jul 23, 2020 at 02:40:56AM -0500, YiFei Zhu wrote:
> From: YiFei Zhu <zhuyifei@xxxxxxxxxx>
> 
> This change comes in several parts:
> 
> One, the restriction that the CGROUP_STORAGE map can only be used
> by one program is removed. This results in the removal of the field
> 'aux' in struct bpf_cgroup_storage_map, and removal of relevant
> code associated with the field, and removal of now-noop functions
> bpf_free_cgroup_storage and bpf_cgroup_storage_release.
> 
> Second, we permit a key of type u64 as the key to the map.
> Providing such a key type indicates that the map should ignore
> attach type when comparing map keys. However, for simplicity newly
> linked storage will still have the attach type at link time in
> its key struct. cgroup_storage_check_btf is adapted to accept
> u64 as the type of the key.
> 
> Third, because the storages are now shared, the storages cannot
> be unconditionally freed on program detach. There could be two
> ways to solve this issue:
> * A. Reference count the usage of the storages, and free when the
>      last program is detached.
> * B. Free only when the storage is impossible to be referred to
>      again, i.e. when either the cgroup_bpf it is attached to, or
>      the map itself, is freed.
> Option A has the side effect that, when the user detach and
> reattach a program, whether the program gets a fresh storage
> depends on whether there is another program attached using that
> storage. This could trigger races if the user is multi-threaded,
> and since nondeterminism in data races is evil, go with option B.
> 
> The both the map and the cgroup_bpf now tracks their associated
> storages, and the storage unlink and free are removed from
> cgroup_bpf_detach and added to cgroup_bpf_release and
> cgroup_storage_map_free. The latter also new holds the cgroup_mutex
> to prevent any races with the former.
> 
> Fourth, on attach, we reuse the old storage if the key already
> exists in the map, via cgroup_storage_lookup. If the storage
> does not exist yet, we create a new one, and publish it at the
> last step in the attach process. This does not create a race
> condition because for the whole attach the cgroup_mutex is held.
> We keep track of an array of new storages that was allocated
> and if the process fails only the new storages would get freed.
> 
[ ... ]

> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index 51bd5a8cb01b..b246ae07f87d 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -9,6 +9,8 @@
>  #include <linux/slab.h>
>  #include <uapi/linux/btf.h>
>  
> +#include "../cgroup/cgroup-internal.h"
> +
>  DEFINE_PER_CPU(struct bpf_cgroup_storage*, bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
>  
>  #ifdef CONFIG_CGROUP_BPF
> @@ -20,7 +22,6 @@ struct bpf_cgroup_storage_map {
>  	struct bpf_map map;
>  
>  	spinlock_t lock;
> -	struct bpf_prog_aux *aux;
>  	struct rb_root root;
>  	struct list_head list;
>  };
> @@ -30,24 +31,41 @@ static struct bpf_cgroup_storage_map *map_to_storage(struct bpf_map *map)
>  	return container_of(map, struct bpf_cgroup_storage_map, map);
>  }
>  
> -static int bpf_cgroup_storage_key_cmp(
> -	const struct bpf_cgroup_storage_key *key1,
> -	const struct bpf_cgroup_storage_key *key2)
> +static bool attach_type_isolated(const struct bpf_map *map)
>  {
> -	if (key1->cgroup_inode_id < key2->cgroup_inode_id)
> -		return -1;
> -	else if (key1->cgroup_inode_id > key2->cgroup_inode_id)
> -		return 1;
> -	else if (key1->attach_type < key2->attach_type)
> -		return -1;
> -	else if (key1->attach_type > key2->attach_type)
> -		return 1;
> +	return map->key_size == sizeof(struct bpf_cgroup_storage_key);
> +}
> +
> +static int bpf_cgroup_storage_key_cmp(const struct bpf_cgroup_storage_map *map,
> +				      const void *_key1, const void *_key2)
> +{
> +	if (attach_type_isolated(&map->map)) {
> +		const struct bpf_cgroup_storage_key *key1 = _key1;
> +		const struct bpf_cgroup_storage_key *key2 = _key2;
> +
> +		if (key1->cgroup_inode_id < key2->cgroup_inode_id)
> +			return -1;
> +		else if (key1->cgroup_inode_id > key2->cgroup_inode_id)
> +			return 1;
> +		else if (key1->attach_type < key2->attach_type)
> +			return -1;
> +		else if (key1->attach_type > key2->attach_type)
> +			return 1;
> +	} else {
> +		const __u64 *key1 = _key1;
> +		const __u64 *key2 = _key2;
Nit.
Instead of key1 and key2, a meaningful name will be easier
to read and tell what it is comparing about later, like the
key1->"cgroup_inode_id" in the "if" case above.  May be:

		__u64 cgroup_inode_id1 = *(__u64 *)_key1;
		__u64 cgroup_inode_id2 = *(__u64 *)_key2;

> +
> +		if (*key1 < *key2)
> +			return -1;
> +		else if (*key1 > *key2)
> +			return 1;
> +	}
>  	return 0;
>  }
>  
Others LGTM