Hi Martin, [ sorry to jump late in here, on pto currently ] On 06/22/2016 11:17 PM, Martin KaFai Lau wrote:
Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations. To update an element, the caller is expected to obtain a cgroup2 backed fd by open(cgroup2_dir) and then update the array with that fd. Signed-off-by: Martin KaFai Lau <kafai@xxxxxx> Cc: Alexei Starovoitov <ast@xxxxxx> Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> Cc: Tejun Heo <tj@xxxxxxxxxx> Acked-by: Alexei Starovoitov <ast@xxxxxxxxxx>
Could you describe a bit more with regards to pinning maps and how this should interact with cgroups? The two specialized array maps we have (tail calls, perf events) have fairly complicated semantics for when to clean up map slots (see commits c9da161c6517ba1, 3b1efb196eee45b2f0c4). How is this managed with cgroups? Once a cgroup fd is placed into a map and the user removes the cgroup, will this be prevented due to 'being busy', or will the cgroup live further as long as a program is running with a cgroup map entry (but the cgroup itself is not visible from user space in any way anymore)? I presume it's a valid use case to pin a cgroup map, put fds into it and remove the pinned file expecting to continue to match on it, right? So lifetime is really until last prog using a cgroup map somewhere gets removed (even if not accessible from user space anymore, meaning no prog has fd and pinned file was removed). I assume that using struct file here doesn't make sense (commit e03e7ee34fdd1c3) either, right? [...]
+#ifdef CONFIG_CGROUPS +static void *cgroup_fd_array_get_ptr(struct bpf_map *map, + struct file *map_file /* not used */, + int fd) +{ + return cgroup_get_from_fd(fd); +} + +static void cgroup_fd_array_put_ptr(void *ptr) +{ + /* cgroup_put free cgrp after a rcu grace period */ + cgroup_put(ptr);
Yeah, as long as this respects freeing after RCU grace period, it's fine like this ...
+} + +static void cgroup_fd_array_free(struct bpf_map *map) +{ + bpf_fd_array_map_clear(map); + fd_array_map_free(map); +} + +static const struct bpf_map_ops cgroup_array_ops = { + .map_alloc = fd_array_map_alloc, + .map_free = cgroup_fd_array_free, + .map_get_next_key = array_map_get_next_key, + .map_lookup_elem = fd_array_map_lookup_elem, + .map_delete_elem = fd_array_map_delete_elem, + .map_fd_get_ptr = cgroup_fd_array_get_ptr, + .map_fd_put_ptr = cgroup_fd_array_put_ptr, +}; + +static struct bpf_map_type_list cgroup_array_type __read_mostly = { + .ops = &cgroup_array_ops, + .type = BPF_MAP_TYPE_CGROUP_ARRAY, +}; + +static int __init register_cgroup_array_map(void) +{ + bpf_register_map_type(&cgroup_array_type); + return 0; +} +late_initcall(register_cgroup_array_map); +#endif diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c23a4e93..cac13f1 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -393,7 +393,8 @@ static int map_update_elem(union bpf_attr *attr) } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { err = bpf_percpu_array_update(map, key, value, attr->flags); } else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || - map->map_type == BPF_MAP_TYPE_PROG_ARRAY) { + map->map_type == BPF_MAP_TYPE_PROG_ARRAY || + map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY) { rcu_read_lock(); err = bpf_fd_array_map_update_elem(map, f.file, key, value, attr->flags);
-- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html