On Tue, Jul 21, 2020 at 03:56:36PM -0700, sdf@xxxxxxxxxx wrote: > On 07/21, Martin KaFai Lau wrote: > > On Tue, Jul 21, 2020 at 04:20:00PM -0500, YiFei Zhu wrote: > > > On Tue, Jul 21, 2020 at 1:05 PM Martin KaFai Lau <kafai@xxxxxx> wrote: > > > > I quickly checked zero is actually BPF_CGROUP_INET_INGRESS instead > > > > of UNSPEC for attach_type. It will be confusing on the syscall > > > > side. e.g. map dumping with key.attach_type == BPF_CGROUP_INET_INGRESS > > > > while the only cgroup program is actually attaching to > > BPF_CGROUP_INET_EGRESS. > > > > > > > > I don't have a clean way out for this. Adding a non-zero UNSPEC > > > > to "enum bpf_attach_type" seems wrong also. > > > > One possible way out is to allow the bpf_cgroup_storage_map > > > > to have a smaller key size (i.e. allow both sizeof(cgroup_inode_id) > > > > and the existing sizeof(struct bpf_cgroup_storage_key). That > > > > will completely remove attach_type from the picture if the user > > > > specified to use sizeof(cgroup_inode_id) as the key_size. > > > > If sizeof(struct bpf_cgroup_storage_key) is specified, the attach_type > > > > is still used in cmp(). > > > > > > > > The bpf_cgroup_storage_key_cmp() need to do cmp accordingly. > > > > Same changes is needed to lookup_elem, delete_elem, check_btf, > > map_alloc...etc. > > > > > > ACK. Considering that the cgroup_inode_id is the first field of struct > > > bpf_cgroup_storage_key, I can probably just use this fact and directly > > > use the user-provided map key (after it's copied to kernel space) and > > > cast the pointer to a __u64 - or are there any architectures where the > > > first field is not at offset zero? If this is done then we can change > > > all the kernel internal APIs to use __u64 cgroup_inode_id, ignoring > > > the existence of the struct aside from the map creation (size checking > > > and BTF checking). > > Just to be clear. The problem is the userspace is expecting > > the whole key (cgroup_id, attach_type) to be meaningful and > > we cannot stop supporting this existing key type now. > To step back a bit. I think in the commit message we mentioned that > attach_type is essentially (mostly) meaningless right now. > If I want to share cgroup storage between BPF_CGROUP_INET_INGRESS and > BPF_CGROUP_INET_EGRESS, the verifier will refuse to load those programs. > So doing lookup with different attach_type for the same storage > shouldn't really happen. I think we are talking about the existing map->aux check in bpf_cgroup_storage_assign()? Right, the map is dedicated for only one bpf prog which usually has only one expected_attach_type. Thus, the attach_type of the key is not really useful since it will always be the same. However, it does not mean the exposed attach_type (of the key) is meaningless or the value is invalid. The exposed value is valid. I am trying to say attach_type is still part of the key and exposed to userspace (e.g. in map dump) but it is sort of invalid after this change because I am not sure what that "0" means now. Also, as part of the key, it is intuitive for user to think the storage is unique per (cgroup, attach_type). This uniqueness is always true now because of the map->aux logic and guaranteed by the verifier. With this patch, this "key" intuition is gone where the kernel quietly ignore the attach_type. > > Except. There is one use-case where it does make sense. If you take > the same loaded program and attach it to both ingress and egress, each > one will get a separate storage copy. And only in this case > attach_type really has any meaning. > > But since more and more attach types started to require > expected_attach_type, it seems that the case where the same > prog is attached to two different hooks should be almost > non-existent. That's why we've decided to go with > the idea of completely ignoring it. > > So the question is: is it really worth it trying to preserve > that obscure use-case (and have more complex implementation) or we can > safely assume that nobody is doing that sort of thing in the wild (and > can simplify the internal logic a bit)? I think it is a reasonable expectation for map-dump to show a meaningful attach_type if it is indeed part of the key.