On Wed, Jul 15, 2020 at 04:06:45PM -0500, YiFei Zhu wrote: > > > + list_for_each_entry_safe(storage, stmp, storages, list_cg) { > > > + bpf_cgroup_storage_unlink(storage); > > > + bpf_cgroup_storage_free(storage); > > cgroup_storage_map_free() is also doing unlink and free. > > It is not clear to me what prevent cgroup_bpf_release() > > and cgroup_storage_map_free() from doing unlink and free at the same time. > > > > A few words comment here would be useful if it is fine. > > Good catch. This can happen. Considering that this section is guarded > by cgroup_mutex, and that cgroup_storage_map_free is called in > sleepable context (either workqueue or map_create) I think the most > straightforward way is to also hold the cgroup_mutex in > cgroup_storage_map_free. Will fix in v3. > > > > @@ -458,10 +457,10 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, > > > if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog)) > > > return -ENOMEM; > > > > > > + bpf_cgroup_storages_link(storage, cgrp); > > here. After the new change in bpf_cgroup_storage_link(), > > the storage could be an old/existing storage that is > > being used by other bpf progs. > > Right, I will swap storages_alloc to below this kmalloc and goto > cleanup if storages_alloc fails. > > > > - bpf_cgroup_storages_free(old_storage); > > > if (old_prog) > > > bpf_prog_put(old_prog); > > > else > > > static_branch_inc(&cgroup_bpf_enabled_key); > > > - bpf_cgroup_storages_link(pl->storage, cgrp, type); > > Another side effect is, the "new" storage is still published to > > the map even the attach has failed. I think this may be ok. > > Correct. To really fix this we would need to keep track of exactly > which storages are new and which are old, and I don't think that doing > so has any significant benefits given that the lifetime of the > storages are still bounded. If userspace receives the error they are > probably going to either exit or retry. Exit will cause the storages > to be freed along with the map, and retry, if successful, needs the > storage be published anyways. That is the reasoning for thinking it is > okay. It should not be hard. Stay with the existing approach. lookup old, found=>reuse, not-found=>alloc. Only publish the new storage after the attach has succeeded. > > > > > - next->attach_type = storage->key.attach_type; > > The map dump (e.g. bpftool map dump) will also show attach_type zero > > in the key now. Please also mention that in the commit message. > > Will fix in v3. > > > This patch allows a cgroup-storage to be shared among different bpf-progs > > which is in the right direction that makes bpf_cgroup_storage_map behaves > > more like other bpf-maps do. However, each bpf-prog can still only allow > > one "bpf_cgroup_storage_map" to be used (excluding the difference in the > > SHARED/PERCPU bpf_cgroup_storage_type). > > i.e. each bpf-prog can only access one type of cgroup-storage. > > e.g. prog-A stores storage-A. If prog-B wants to store storage-B and > > also read storage-A, it is not possible if I read it correctly. > > You are correct. It is still bound by the per-cpu variable, and > because the per-cpu variable is an array of storages, one for each > type, it still does not support more than one storage per type. > > > While I think this patch is a fine extension to the existing > > bpf_cgroup_storage_map and a good step forward to make bpf_cgroup_storage_map > > sharable like other bpf maps do. Have you looked at bpf_sk_storage.c which > > also defines a local storage for a sk but a bpf prog can define multiple > > storages to be stored in a sk. It is doing similar thing of this > > patch (e.g. a link to the storage, another link to the map, the life > > time of the storage is tied to the map and the sk...etc.). KP Singh is > > generalizing it such that bpf-prog can store data in potentially any > > kernel object other than sk [1]. His use case is to store data in inode. > > I think it can be used for the cgroup also. The only thing missing there > > is the "PERCPU" type. It was not there because there is no such need for sk > > but should be something quite doable. > > I think this is a good idea, but it will be a much bigger project. I > would prefer to fix this problem of cgroup_storage not being shareable > first, and when KP's patches land I'll take a look at how to reuse > their code. And, as you said, it would be more involved due to needing > to add "PERCPU" support. I am not against this patch considering it is a feature extension to the existing bpf_cgroup_storage_map. The bpf_local_storge is in-progress, so the timing may not align also. However, it is not very complicated to create storage for another kernel object either. You should take a look at KP Singh's bpf_inode_storage.c which is pretty straight forward to create storage for inode. The locking/racing issues discussed in this thread have already been considered in the bpf_(sk|local)_storage. Also, if the generalized bpf_local_storage is used, using a completely separate new map will be cleaner (i.e. completely separate from bpf_cgroup_storage_map and not much of it will be reused). e.g. Unlike bpf_cgroup_storage_map, there is no need to create storage at the attach time. The storage will be created at runtime when the very first bpf-prog accessing it.