Re: [PATCH v4 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup

sdf@xxxxxxxxxx · Tue, 21 Jul 2020 15:56:36 -0700

On 07/21, Martin KaFai Lau wrote:
On Tue, Jul 21, 2020 at 04:20:00PM -0500, YiFei Zhu wrote:
> On Tue, Jul 21, 2020 at 1:05 PM Martin KaFai Lau <kafai@xxxxxx> wrote:
> > I quickly checked zero is actually BPF_CGROUP_INET_INGRESS instead
> > of UNSPEC for attach_type.  It will be confusing on the syscall
> > side. e.g. map dumping with key.attach_type == BPF_CGROUP_INET_INGRESS
> > while the only cgroup program is actually attaching to  
BPF_CGROUP_INET_EGRESS.
> >
> > I don't have a clean way out for this.  Adding a non-zero UNSPEC
> > to "enum bpf_attach_type" seems wrong also.
> > One possible way out is to allow the bpf_cgroup_storage_map
> > to have a smaller key size (i.e. allow both sizeof(cgroup_inode_id)
> > and the existing sizeof(struct bpf_cgroup_storage_key).  That
> > will completely remove attach_type from the picture if the user
> > specified to use sizeof(cgroup_inode_id) as the key_size.
> > If sizeof(struct bpf_cgroup_storage_key) is specified, the attach_type
> > is still used in cmp().
> >
> > The bpf_cgroup_storage_key_cmp() need to do cmp accordingly.
> > Same changes is needed to lookup_elem, delete_elem, check_btf,  
map_alloc...etc.
>
> ACK. Considering that the cgroup_inode_id is the first field of struct
> bpf_cgroup_storage_key, I can probably just use this fact and directly
> use the user-provided map key (after it's copied to kernel space) and
> cast the pointer to a __u64 - or are there any architectures where the
> first field is not at offset zero? If this is done then we can change
> all the kernel internal APIs to use __u64 cgroup_inode_id, ignoring
> the existence of the struct aside from the map creation (size checking
> and BTF checking).
Just to be clear.  The problem is the userspace is expecting
the whole key (cgroup_id, attach_type) to be meaningful and
we cannot stop supporting this existing key type now.
To step back a bit. I think in the commit message we mentioned that
attach_type is essentially (mostly) meaningless right now.
If I want to share cgroup storage between BPF_CGROUP_INET_INGRESS and
BPF_CGROUP_INET_EGRESS, the verifier will refuse to load those programs.
So doing lookup with different attach_type for the same storage
shouldn't really happen.

Except. There is one use-case where it does make sense. If you take
the same loaded program and attach it to both ingress and egress, each
one will get a separate storage copy. And only in this case
attach_type really has any meaning.

But since more and more attach types started to require
expected_attach_type, it seems that the case where the same
prog is attached to two different hooks should be almost
non-existent. That's why we've decided to go with
the idea of completely ignoring it.

So the question is: is it really worth it trying to preserve
that obscure use-case (and have more complex implementation) or we can
safely assume that nobody is doing that sort of thing in the wild (and
can simplify the internal logic a bit)?