Re: [PATCH v3 1/6] cgroup: Allow registration and lookup of cgroup private data

Roman Gushchin <guro@xxxxxx> · Tue, 13 Mar 2018 22:09:16 +0000



On Tue, Mar 13, 2018 at 02:47:45PM -0700, Alexei Starovoitov wrote:
> On 3/13/18 2:37 PM, Roman Gushchin wrote:
> > On Tue, Mar 13, 2018 at 02:27:58PM -0700, Alexei Starovoitov wrote:
> > > On 3/13/18 1:50 PM, Tejun Heo wrote:
> > > > Hello, Matt.
> > > > 
> > > > cc'ing Roman and Alexei.
> > > > 
> > > > On Tue, Mar 06, 2018 at 03:46:55PM -0800, Matt Roper wrote:
> > > > > There are cases where other parts of the kernel may wish to store data
> > > > > associated with individual cgroups without building a full cgroup
> > > > > controller.  Let's add interfaces to allow them to register and lookup
> > > > > this private data for individual cgroups.
> > > > > 
> > > > > A kernel system (e.g., a driver) that wishes to register private data
> > > > > for a cgroup will do so by subclassing the 'struct cgroup_priv'
> > > > > structure to describe the necessary data to store.  Before registering a
> > > > > private data structure to a cgroup, the caller should fill in the 'key'
> > > > > and 'free' fields of the base cgroup_priv structure.
> > > > > 
> > > > >  * 'key' should be a unique void* that will act as a key for future
> > > > >    privdata lookups/removals.  Note that this allows drivers to store
> > > > >    per-device private data for a cgroup by using a device pointer as a key.
> > > > > 
> > > > >  * 'free' should be a function pointer to a function that may be used
> > > > >    to destroy the private data.  This function will be called
> > > > >    automatically if the underlying cgroup is destroyed.
> > > > 
> > > > This feature turned out to have more users than I originally
> > > > anticipated and bpf also wants something like this to track network
> > > > states.  The requirements are pretty similar but not quite the same.
> > > > The extra requirements are...
> > > > 
> > > > * Lookup must be really cheap.  Probably using pointer hash or walking
> > > >   list isn't great, so maybe idr based lookup + RCU protected index
> > > >   table per cgroup?
> > > > 
> > > > * It should support both regular memory and percpu pointers.  Given
> > > >   that what cgroup does is pretty much cgroup:key -> pointer lookup,
> > > >   it's mostly about getting the interface right so that it's not too
> > > >   error-prone.
> > > 
> > > from bpf side there should be _zero_ lookups.
> > > If bpf do a lookup it can equally use its own map to do that.
> > > 
> > > From bpf program point of view it will look like:
> > > struct my_data {
> > >   u64 a;
> > >   u32 b;
> > > } *ptr;
> > > ptr = bpf_get_cgroup_buf(skb, sizeof(struct my_data));
> > > 
> > > bpf_get_cgroup_buf() is lookup-free. Worst case it will do pointer
> > > dereferences
> > > skb->sk->sk_cgrp_data->val to get to cgroup and from cgroup to get pointer
> > > to the buffer.
> > 
> > Having strictly one buffer per-cgroup sounds very limiting.
> > What if two independent bpf programs start using it?
> > 
> > > In good case it may be optimized (inlined) by the verifier into absolute
> > > address of that cgroup scratch buffer at attach time.
> > 
> > Maybe we can have an idr-based index table (as Tejun suggested above),
> > but avoid performance penalty by optimizing the lookup out at the attach time?
> 
> it has to be zero lookups. If idr lookup is involved, it's cleaner
> to add idr as new bpf map type and use cgroup ino as an id.
> 

Hm, what if we will have a buffer attached to a bpf program attached to a cgroup?
Then we will have zero lookups on a hot path, but independent bpf programs
will be able to use their own buffers.
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html