On 10/17/22 12:11 PM, Yosry Ahmed wrote:
> On Mon, Oct 17, 2022 at 12:07 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
> >
> > On Mon, Oct 17, 2022 at 11:47 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 17, 2022 at 11:43 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Oct 17, 2022 at 11:26 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx>
wrote:
> > > > >
> > > > > On Mon, Oct 17, 2022 at 11:02 AM <sdf@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On 10/13, Yonghong Song wrote:
> > > > > > > Similar to sk/inode/task storage, implement similar cgroup local
storage.
> > > > > >
> > > > > > > There already exists a local storage implementation for
cgroup-attached
> > > > > > > bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
> > > > > > > bpf_get_local_storage(). But there are use cases such that
non-cgroup
> > > > > > > attached bpf progs wants to access cgroup local storage data.
For example,
> > > > > > > tc egress prog has access to sk and cgroup. It is possible to use
> > > > > > > sk local storage to emulate cgroup local storage by storing data in
> > > > > > > socket.
> > > > > > > But this is a waste as it could be lots of sockets belonging to a
> > > > > > > particular
> > > > > > > cgroup. Alternatively, a separate map can be created with cgroup
id as
> > > > > > > the key.
> > > > > > > But this will introduce additional overhead to manipulate the
new map.
> > > > > > > A cgroup local storage, similar to existing sk/inode/task storage,
> > > > > > > should help for this use case.
> > > > > >
> > > > > > > The life-cycle of storage is managed with the life-cycle of the
> > > > > > > cgroup struct. i.e. the storage is destroyed along with the
owning cgroup
> > > > > > > with a callback to the bpf_cgroup_storage_free when cgroup itself
> > > > > > > is deleted.
> > > > > >
> > > > > > > The userspace map operations can be done by using a cgroup fd as
a key
> > > > > > > passed to the lookup, update and delete operations.
> > > > > >
> > > > > >
> > > > > > [..]
> > > > > >
> > > > > > > Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old
cgroup
> > > > > > > local
> > > > > > > storage support, the new map name
BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE is
> > > > > > > used
> > > > > > > for cgroup storage available to non-cgroup-attached bpf
programs. The two
> > > > > > > helpers are named as bpf_cgroup_local_storage_get() and
> > > > > > > bpf_cgroup_local_storage_delete().
> > > > > >
> > > > > > Have you considered doing something similar to 7d9c3427894f ("bpf:
Make
> > > > > > cgroup storages shared between programs on the same cgroup") where
> > > > > > the map changes its behavior depending on the key size (see
key_size checks
> > > > > > in cgroup_storage_map_alloc)? Looks like sizeof(int) for fd still
> > > > > > can be used so we can, in theory, reuse the name..
> > > > > >
> > > > > > Pros:
> > > > > > - no need for a new map name
> > > > > >
> > > > > > Cons:
> > > > > > - existing BPF_MAP_TYPE_CGROUP_STORAGE is already messy; might be
not a
> > > > > > good idea to add more stuff to it?
> > > > > >
> > > > > > But, for the very least, should we also extend
> > > > > > Documentation/bpf/map_cgroup_storage.rst to cover the new map? We've
> > > > > > tried to keep some of the important details in there..
> > > > >
> > > > > This might be a long shot, but is it possible to switch completely to
> > > > > this new generic cgroup storage, and for programs that attach to
> > > > > cgroups we can still do lookups/allocations during attachment like we
> > > > > do today? IOW, maintain the current API for cgroup progs but switch it
> > > > > to use this new map type instead.
> > > > >
> > > > > It feels like this map type is more generic and can be a superset of
> > > > > the existing cgroup storage, but I feel like I am missing something.
> > > >
> > > > I feel like the biggest issue is that the existing
> > > > bpf_get_local_storage helper is guaranteed to always return non-null
> > > > and the verifier doesn't require the programs to do null checks on it;
> > > > the new helper might return NULL making all existing programs fail the
> > > > verifier.
> > >
> > > What I meant is, keep the old bpf_get_local_storage helper only for
> > > cgroup-attached programs like we have today, and add a new generic
> > > bpf_cgroup_local_storage_get() helper.
> > >
> > > For cgroup-attached programs, make sure a cgroup storage entry is
> > > allocated and hooked to the helper on program attach time, to keep
> > > today's behavior constant.
> > >
> > > For other programs, the bpf_cgroup_local_storage_get() will do the
> > > normal lookup and allocate if necessary.
> > >
> > > Does this make any sense to you?
> >
> > But then you also need to somehow mark these to make sure it's not
> > possible to delete them as long as the program is loaded/attached? Not
> > saying it's impossible, but it's a bit of a departure from the
> > existing common local storage framework used by inode/task; not sure
> > whether we want to pull all this complexity in there? But we can
> > definitely try if there is a wider agreement..
>
> I agree that it's not ideal, but it feels like we are comparing two
> non-ideal options anyway, I am just throwing ideas around :)