On Tue, Mar 13, 2018 at 02:47:45PM -0700, Alexei Starovoitov wrote: > On 3/13/18 2:37 PM, Roman Gushchin wrote: > > On Tue, Mar 13, 2018 at 02:27:58PM -0700, Alexei Starovoitov wrote: > > > On 3/13/18 1:50 PM, Tejun Heo wrote: > > > > Hello, Matt. > > > > > > > > cc'ing Roman and Alexei. > > > > > > > > On Tue, Mar 06, 2018 at 03:46:55PM -0800, Matt Roper wrote: > > > > > There are cases where other parts of the kernel may wish to store data > > > > > associated with individual cgroups without building a full cgroup > > > > > controller. Let's add interfaces to allow them to register and lookup > > > > > this private data for individual cgroups. > > > > > > > > > > A kernel system (e.g., a driver) that wishes to register private data > > > > > for a cgroup will do so by subclassing the 'struct cgroup_priv' > > > > > structure to describe the necessary data to store. Before registering a > > > > > private data structure to a cgroup, the caller should fill in the 'key' > > > > > and 'free' fields of the base cgroup_priv structure. > > > > > > > > > > * 'key' should be a unique void* that will act as a key for future > > > > > privdata lookups/removals. Note that this allows drivers to store > > > > > per-device private data for a cgroup by using a device pointer as a key. > > > > > > > > > > * 'free' should be a function pointer to a function that may be used > > > > > to destroy the private data. This function will be called > > > > > automatically if the underlying cgroup is destroyed. > > > > > > > > This feature turned out to have more users than I originally > > > > anticipated and bpf also wants something like this to track network > > > > states. The requirements are pretty similar but not quite the same. > > > > The extra requirements are... > > > > > > > > * Lookup must be really cheap. Probably using pointer hash or walking > > > > list isn't great, so maybe idr based lookup + RCU protected index > > > > table per cgroup? > > > > > > > > * It should support both regular memory and percpu pointers. Given > > > > that what cgroup does is pretty much cgroup:key -> pointer lookup, > > > > it's mostly about getting the interface right so that it's not too > > > > error-prone. > > > > > > from bpf side there should be _zero_ lookups. > > > If bpf do a lookup it can equally use its own map to do that. > > > > > > From bpf program point of view it will look like: > > > struct my_data { > > > u64 a; > > > u32 b; > > > } *ptr; > > > ptr = bpf_get_cgroup_buf(skb, sizeof(struct my_data)); > > > > > > bpf_get_cgroup_buf() is lookup-free. Worst case it will do pointer > > > dereferences > > > skb->sk->sk_cgrp_data->val to get to cgroup and from cgroup to get pointer > > > to the buffer. > > > > Having strictly one buffer per-cgroup sounds very limiting. > > What if two independent bpf programs start using it? > > > > > In good case it may be optimized (inlined) by the verifier into absolute > > > address of that cgroup scratch buffer at attach time. > > > > Maybe we can have an idr-based index table (as Tejun suggested above), > > but avoid performance penalty by optimizing the lookup out at the attach time? > > it has to be zero lookups. If idr lookup is involved, it's cleaner > to add idr as new bpf map type and use cgroup ino as an id. > Hm, what if we will have a buffer attached to a bpf program attached to a cgroup? Then we will have zero lookups on a hot path, but independent bpf programs will be able to use their own buffers. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html