On Mon, Jun 29, 2020 at 06:01:00PM +0200, KP Singh wrote: > Thanks for your feedback! Apologies it took some time for me > to incorporate this into another revision. > > On 18-Jun 23:43, Martin KaFai Lau wrote: > > On Wed, Jun 17, 2020 at 10:29:38PM +0200, KP Singh wrote: > > > From: KP Singh <kpsingh@xxxxxxxxxx> > > > > > > Refactor the functionality in bpf_sk_storage.c so that concept of > > > storage linked to kernel objects can be extended to other objects like > > > inode, task_struct etc. > > > > > > bpf_sk_storage is updated to be bpf_local_storage with a union that > > > contains a pointer to the owner object. The type of the > > > bpf_local_storage can be determined using the newly added > > > bpf_local_storage_type enum. > > > > > > Each new local storage will still be a separate map and provide its own > > > set of helpers. This allows for future object specific extensions and > > > still share a lot of the underlying implementation. > > Thanks for taking up this effort to refactor sk_local_storage. > > > > I took a quick look. I have some comments and would like to explore > > some thoughts. > > > > > --- a/net/core/bpf_sk_storage.c > > > +++ b/kernel/bpf/bpf_local_storage.c > > > @@ -1,19 +1,22 @@ > > > // SPDX-License-Identifier: GPL-2.0 > > > /* Copyright (c) 2019 Facebook */ > > > +#include "linux/bpf.h" > > > +#include "asm-generic/bug.h" > > > +#include "linux/err.h" > > "<" ">" > > > > > #include <linux/rculist.h> > > > #include <linux/list.h> > > > #include <linux/hash.h> > > > #include <linux/types.h> > > > #include <linux/spinlock.h> > > > #include <linux/bpf.h> > > > -#include <net/bpf_sk_storage.h> > > > +#include <linux/bpf_local_storage.h> > > > #include <net/sock.h> > > > #include <uapi/linux/sock_diag.h> > > > #include <uapi/linux/btf.h> > > > > > > static atomic_t cache_idx; > > inode local storage and sk local storage probably need a separate > > cache_idx. An improvement on picking cache_idx has just been > > landed also. > > I see, thanks! I rebased and I now see that cache_idx is now a: > > static u64 cache_idx_usage_counts[BPF_STORAGE_CACHE_SIZE]; > > which tracks the free cache slots rather than using a single atomic > cache_idx. I guess all types of local storage can share this now > right? I believe they have to be separated. A sk-storage will not be cached/stored in inode. Caching a sk-storage at idx=0 of a sk should not stop an inode-storage to be cached at the same idx of a inode. > > > > > [ ... ] > > > > > +struct bpf_local_storage { > > > + struct bpf_local_storage_data __rcu *cache[BPF_STORAGE_CACHE_SIZE]; > > > return NULL; > > [...] > > > > } > > > > > > -/* sk_storage->lock must be held and selem->sk_storage == sk_storage. > > > +static void __unlink_local_storage(struct bpf_local_storage *local_storage, > > > + bool uncharge_omem) > > Nit. indent is off. There are a few more cases like this. > > Thanks, will fix this. (note to self: don't trust the editor's > clang-format blindly). > > > > > > +{ > > > + struct sock *sk; > > > + > > > + switch (local_storage->stype) { > > Does it need a new bpf_local_storage_type? Is map_type as good? > > > > Instead of adding any new member (e.g. stype) to > > "struct bpf_local_storage", can the smap pointer be directly used > > here instead? > > > > For example in __unlink_local_storage() here, it should > > have a hold to the selem which then has a hold to smap. > > Good point, Updated to using the map->map_type. > > > > > > + case BPF_LOCAL_STORAGE_SK: > > > + sk = local_storage->sk; > > > + if (uncharge_omem) > > > + atomic_sub(sizeof(struct bpf_local_storage), > > > + &sk->sk_omem_alloc); > > > + > > > + /* After this RCU_INIT, sk may be freed and cannot be used */ > > > + RCU_INIT_POINTER(sk->sk_bpf_storage, NULL); > > > + local_storage->sk = NULL; > > > + break; > > > + } > > Another thought on the stype switch cases. > > > > Instead of having multiple switches on stype in bpf_local_storage.c which may > > not be scalable soon if we are planning to support a few more kernel objects, > > have you considered putting them into its own "ops". May be a few new > > ops can be added to bpf_map_ops to do local storage unlink/update/alloc...etc. > > Good idea, I was able to refactor this with the following ops: > > /* Functions called by bpf_local_storage maps */ > void (*map_local_storage_unlink)(struct bpf_local_storage *local_storage, > bool uncharge_omem); > struct bpf_local_storage_elem *(*map_selem_alloc)( > struct bpf_local_storage_map *smap, void *owner, void *value, > bool charge_omem); > struct bpf_local_storage_data *(*map_local_storage_update)( > void *owner, struct bpf_map *map, void *value, u64 flags); > int (*map_local_storage_alloc)(void *owner, > struct bpf_local_storage_map *smap, > struct bpf_local_storage_elem *elem); > > Let me know if you have any particular thoughts/suggestions about > this. Make sense to me. It seems the fast-path's lookup can all be done within __local_storage_lookup(). No indirect call is needed, so should be good. > > > > > > +} > > > + > > > +/* local_storage->lock must be held and selem->local_storage == local_storage. > > > * The caller must ensure selem->smap is still valid to be > > > * dereferenced for its smap->elem_size and smap->cache_idx. > > > + * > > > + * uncharge_omem is only relevant when: > > [...] > > > > + /* bpf_local_storage_map is currently limited to CAP_SYS_ADMIN as > > > * the map_alloc_check() side also does. > > > */ > > > if (!bpf_capable()) > > > @@ -1025,10 +1127,10 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs) > > > } > > > EXPORT_SYMBOL_GPL(bpf_sk_storage_diag_alloc); > > Would it be cleaner to leave bpf_sk specific function, map_ops, and func_proto > > in net/core/bpf_sk_storage.c? > > Sure, I can also keep the sk_clone code their as well for now. Just came to my mind. For easier review purpose, may be first do the refactoring/renaming within bpf_sk_storage.c first and then create another patch to move the common parts to a new file bpf_local_storage.c. Not sure if it will be indeed easier to read the diff in practice. I probably should be able to follow it either way. > > > > > There is a test in map_tests/sk_storage_map.c, in case you may not notice. > > I will try to make it generic as a part of this series. If it takes > too much time, I will send a separate patch for testing > inode_storage_map and till then we have some assurance with > test_local_storage in test_progs. Sure. no problem. It is mostly for you to test sk_storage to ensure things ;) Also give some ideas on what racing conditions have been considered in the sk_storage test and may be the inode storage test want to stress similar code path.