On 10/17/22 12:11 PM, Yosry Ahmed wrote:
> On Mon, Oct 17, 2022 at 12:07 PM Stanislav Fomichev
<sdf@xxxxxxxxxx> wrote:
> >
> > On Mon, Oct 17, 2022 at 11:47 AM Yosry Ahmed
<yosryahmed@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 17, 2022 at 11:43 AM Stanislav Fomichev
<sdf@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Oct 17, 2022 at 11:26 AM Yosry Ahmed
<yosryahmed@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Oct 17, 2022 at 11:02 AM <sdf@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On 10/13, Yonghong Song wrote:
> > > > > > > Similar to sk/inode/task storage, implement similar
cgroup local storage.
> > > > > >
> > > > > > > There already exists a local storage implementation for
cgroup-attached
> > > > > > > bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE
and helper
> > > > > > > bpf_get_local_storage(). But there are use cases such
that non-cgroup
> > > > > > > attached bpf progs wants to access cgroup local storage
data. For example,
> > > > > > > tc egress prog has access to sk and cgroup. It is
possible to use
> > > > > > > sk local storage to emulate cgroup local storage by
storing data in
> > > > > > > socket.
> > > > > > > But this is a waste as it could be lots of sockets
belonging to a
> > > > > > > particular
> > > > > > > cgroup. Alternatively, a separate map can be created
with cgroup id as
> > > > > > > the key.
> > > > > > > But this will introduce additional overhead to
manipulate the new map.
> > > > > > > A cgroup local storage, similar to existing
sk/inode/task storage,
> > > > > > > should help for this use case.
> > > > > >
> > > > > > > The life-cycle of storage is managed with the
life-cycle of the
> > > > > > > cgroup struct. i.e. the storage is destroyed along
with the owning cgroup
> > > > > > > with a callback to the bpf_cgroup_storage_free when
cgroup itself
> > > > > > > is deleted.
> > > > > >
> > > > > > > The userspace map operations can be done by using a
cgroup fd as a key
> > > > > > > passed to the lookup, update and delete operations.
> > > > > >
> > > > > >
> > > > > > [..]
> > > > > >
> > > > > > > Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been
used for old cgroup
> > > > > > > local
> > > > > > > storage support, the new map name
BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE is
> > > > > > > used
> > > > > > > for cgroup storage available to non-cgroup-attached bpf
programs. The two
> > > > > > > helpers are named as bpf_cgroup_local_storage_get() and
> > > > > > > bpf_cgroup_local_storage_delete().
> > > > > >
> > > > > > Have you considered doing something similar to
7d9c3427894f ("bpf: Make
> > > > > > cgroup storages shared between programs on the same
cgroup") where
> > > > > > the map changes its behavior depending on the key size
(see key_size checks
> > > > > > in cgroup_storage_map_alloc)? Looks like sizeof(int) for
fd still
> > > > > > can be used so we can, in theory, reuse the name..
> > > > > >
> > > > > > Pros:
> > > > > > - no need for a new map name
> > > > > >
> > > > > > Cons:
> > > > > > - existing BPF_MAP_TYPE_CGROUP_STORAGE is already messy;
might be not a
> > > > > > good idea to add more stuff to it?
> > > > > >
> > > > > > But, for the very least, should we also extend
> > > > > > Documentation/bpf/map_cgroup_storage.rst to cover the new
map? We've
> > > > > > tried to keep some of the important details in there..
> > > > >
> > > > > This might be a long shot, but is it possible to switch
completely to
> > > > > this new generic cgroup storage, and for programs that
attach to
> > > > > cgroups we can still do lookups/allocations during
attachment like we
> > > > > do today? IOW, maintain the current API for cgroup progs
but switch it
> > > > > to use this new map type instead.
> > > > >
> > > > > It feels like this map type is more generic and can be a
superset of
> > > > > the existing cgroup storage, but I feel like I am missing
something.
> > > >
> > > > I feel like the biggest issue is that the existing
> > > > bpf_get_local_storage helper is guaranteed to always return
non-null
> > > > and the verifier doesn't require the programs to do null
checks on it;
> > > > the new helper might return NULL making all existing programs
fail the
> > > > verifier.
> > >
> > > What I meant is, keep the old bpf_get_local_storage helper only
for
> > > cgroup-attached programs like we have today, and add a new generic
> > > bpf_cgroup_local_storage_get() helper.
> > >
> > > For cgroup-attached programs, make sure a cgroup storage entry is
> > > allocated and hooked to the helper on program attach time, to keep
> > > today's behavior constant.
> > >
> > > For other programs, the bpf_cgroup_local_storage_get() will do the
> > > normal lookup and allocate if necessary.
> > >
> > > Does this make any sense to you?
> >
> > But then you also need to somehow mark these to make sure it's not
> > possible to delete them as long as the program is
loaded/attached? Not
> > saying it's impossible, but it's a bit of a departure from the
> > existing common local storage framework used by inode/task; not sure
> > whether we want to pull all this complexity in there? But we can
> > definitely try if there is a wider agreement..
>
> I agree that it's not ideal, but it feels like we are comparing two
> non-ideal options anyway, I am just throwing ideas around :)