On Tue, Jul 21, 2020 at 04:20:00PM -0500, YiFei Zhu wrote: > On Tue, Jul 21, 2020 at 1:05 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > I quickly checked zero is actually BPF_CGROUP_INET_INGRESS instead > > of UNSPEC for attach_type. It will be confusing on the syscall > > side. e.g. map dumping with key.attach_type == BPF_CGROUP_INET_INGRESS > > while the only cgroup program is actually attaching to BPF_CGROUP_INET_EGRESS. > > > > I don't have a clean way out for this. Adding a non-zero UNSPEC > > to "enum bpf_attach_type" seems wrong also. > > One possible way out is to allow the bpf_cgroup_storage_map > > to have a smaller key size (i.e. allow both sizeof(cgroup_inode_id) > > and the existing sizeof(struct bpf_cgroup_storage_key). That > > will completely remove attach_type from the picture if the user > > specified to use sizeof(cgroup_inode_id) as the key_size. > > If sizeof(struct bpf_cgroup_storage_key) is specified, the attach_type > > is still used in cmp(). > > > > The bpf_cgroup_storage_key_cmp() need to do cmp accordingly. > > Same changes is needed to lookup_elem, delete_elem, check_btf, map_alloc...etc. > > ACK. Considering that the cgroup_inode_id is the first field of struct > bpf_cgroup_storage_key, I can probably just use this fact and directly > use the user-provided map key (after it's copied to kernel space) and > cast the pointer to a __u64 - or are there any architectures where the > first field is not at offset zero? If this is done then we can change > all the kernel internal APIs to use __u64 cgroup_inode_id, ignoring > the existence of the struct aside from the map creation (size checking > and BTF checking). Just to be clear. The problem is the userspace is expecting the whole key (cgroup_id, attach_type) to be meaningful and we cannot stop supporting this existing key type now. This patch essentially sets all attach_type to 0 to mean unused/UNSPEC/dont_care but 0 is actually BPF_CGROUP_INET_INGRESS in bpf's uapi. "map dump" will be an issue as mentioned earlier. Also, as a key of a map, it is logical for the user to assume that different attach_type will have different storage. e.g. If user lookup by (cgroup_id, 1), what does it mean to quietly return the storage created by (cgroup_id, 0/2/3/4/5)? I think from the map's perspective, it makes more sense to have different "key-cmp" operation for different key type. When the map is created with key_size == (cgroup_id, attach_type), the "cmp" will stay as is to compare the attach_type also. Then the storage will be sharable among the same cgroup and the same attach_type. Then add support for the new key_size == (cgroup_id) which the "cmp" will only compare the "cgroup_id" as this patch does so that the storage can be shared among all attach_types of the same cgroup. Does it make sense?