Re: [PATCH v2 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup

Martin KaFai Lau <kafai@xxxxxx> · Wed, 15 Jul 2020 16:09:49 -0700

On Wed, Jul 15, 2020 at 04:06:45PM -0500, YiFei Zhu wrote:
> > > +     list_for_each_entry_safe(storage, stmp, storages, list_cg) {
> > > +             bpf_cgroup_storage_unlink(storage);
> > > +             bpf_cgroup_storage_free(storage);
> > cgroup_storage_map_free() is also doing unlink and free.
> > It is not clear to me what prevent cgroup_bpf_release()
> > and cgroup_storage_map_free() from doing unlink and free at the same time.
> >
> > A few words comment here would be useful if it is fine.
> 
> Good catch. This can happen. Considering that this section is guarded
> by cgroup_mutex, and that cgroup_storage_map_free is called in
> sleepable context (either workqueue or map_create) I think the most
> straightforward way is to also hold the cgroup_mutex in
> cgroup_storage_map_free. Will fix in v3.
> 
> > > @@ -458,10 +457,10 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
> > >       if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
> > >               return -ENOMEM;
> > >
> > > +     bpf_cgroup_storages_link(storage, cgrp);
> > here. After the new change in bpf_cgroup_storage_link(),
> > the storage could be an old/existing storage that is
> > being used by other bpf progs.
> 
> Right, I will swap storages_alloc to below this kmalloc and goto
> cleanup if storages_alloc fails.
> 
> > > -     bpf_cgroup_storages_free(old_storage);
> > >       if (old_prog)
> > >               bpf_prog_put(old_prog);
> > >       else
> > >               static_branch_inc(&cgroup_bpf_enabled_key);
> > > -     bpf_cgroup_storages_link(pl->storage, cgrp, type);
> > Another side effect is, the "new" storage is still published to
> > the map even the attach has failed.  I think this may be ok.
> 
> Correct. To really fix this we would need to keep track of exactly
> which storages are new and which are old, and I don't think that doing
> so has any significant benefits given that the lifetime of the
> storages are still bounded. If userspace receives the error they are
> probably going to either exit or retry. Exit will cause the storages
> to be freed along with the map, and retry, if successful, needs the
> storage be published anyways. That is the reasoning for thinking it is
> okay.
It should not be hard.  Stay with the existing approach.

lookup old, found=>reuse, not-found=>alloc.

Only publish the new storage after the attach has succeeded.

> 
> 
> > > -     next->attach_type = storage->key.attach_type;
> > The map dump (e.g. bpftool map dump) will also show attach_type zero
> > in the key now.  Please also mention that in the commit message.
> 
> Will fix in v3.
> 
> > This patch allows a cgroup-storage to be shared among different bpf-progs
> > which is in the right direction that makes bpf_cgroup_storage_map behaves
> > more like other bpf-maps do.  However, each bpf-prog can still only allow
> > one "bpf_cgroup_storage_map" to be used (excluding the difference in the
> > SHARED/PERCPU bpf_cgroup_storage_type).
> > i.e. each bpf-prog can only access one type of cgroup-storage.
> > e.g. prog-A stores storage-A.  If prog-B wants to store storage-B and
> > also read storage-A, it is not possible if I read it correctly.
> 
> You are correct. It is still bound by the per-cpu variable, and
> because the per-cpu variable is an array of storages, one for each
> type, it still does not support more than one storage per type.
> 
> > While I think this patch is a fine extension to the existing
> > bpf_cgroup_storage_map and a good step forward to make bpf_cgroup_storage_map
> > sharable like other bpf maps do.  Have you looked at bpf_sk_storage.c which
> > also defines a local storage for a sk but a bpf prog can define multiple
> > storages to be stored in a sk.  It is doing similar thing of this
> > patch (e.g. a link to the storage, another link to the map, the life
> > time of the storage is tied to the map and the sk...etc.).  KP Singh is
> > generalizing it such that bpf-prog can store data in potentially any
> > kernel object other than sk [1].  His use case is to store data in inode.
> > I think it can be used for the cgroup also.  The only thing missing there
> > is the "PERCPU" type.  It was not there because there is no such need for sk
> > but should be something quite doable.
> 
> I think this is a good idea, but it will be a much bigger project. I
> would prefer to fix this problem of cgroup_storage not being shareable
> first, and when KP's patches land I'll take a look at how to reuse
> their code. And, as you said, it would be more involved due to needing
> to add "PERCPU" support.
I am not against this patch considering it is a feature extension to the
existing bpf_cgroup_storage_map.  The bpf_local_storge is in-progress, so
the timing may not align also.

However, it is not very complicated to create storage for another kernel
object either.  You should take a look at KP Singh's bpf_inode_storage.c
which is pretty straight forward to create storage for inode.  The
locking/racing issues discussed in this thread have already
been considered in the bpf_(sk|local)_storage.

Also, if the generalized bpf_local_storage is used, using a completely
separate new map will be cleaner (i.e. completely separate from
bpf_cgroup_storage_map and not much of it will be reused).
e.g. Unlike bpf_cgroup_storage_map,  there is no need to create
storage at the attach time.  The storage will be created at
runtime when the very first bpf-prog accessing it.