On Wed, Jun 03, 2020 at 10:37:47AM -0700, Linus Torvalds wrote: > On Wed, Jun 3, 2020 at 10:20 AM Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote: > > > > We could have inode->i_security be the blob, rather than a pointer to it. > > That will have its own performance issues. > > It wouldn't actually really fix anything, because the inode is so big > and sparsely accessed that it doesn't even really help the cache > density issue. Yeah, it gets rid of the pointer access, but that's > pretty much it. The fact that we randomize the order means that we > can't even really try to aim for any cache density. > > And it would actually not be possible with the current layered > security model anyway, since those blob sizes are dynamic at runtime. > > If we had _only_ SELinux, we could perhaps have hidden the > sid/sclass/task_sid directly in the inode (it would be only slightly > larger than the pointer is, anyway), but even that ship sailed long > long ago due to the whole "no security person can ever agree with > another one on fundamentals". Also there is bpf_lsm now that we're going to run it in production, so performance is as important as ever. Traditional lsm-s have per-lsm per-inode blob. For bpf that doesn't work, since progs come and go at run-time and independent from each other. So we need per-program per-inode blob. To maintain good performance we've proposed: @@ -740,6 +741,10 @@ struct inode { struct fsverity_info *i_verity_info; #endif +#ifdef CONFIG_BPF_SYSCALL + struct bpf_local_storage __rcu *inode_bpf_storage; +#endif https://patchwork.ozlabs.org/project/netdev/patch/20200526163336.63653-3-kpsingh@xxxxxxxxxxxx/ but got pushback, so we're going to use lsm style for now: +static inline struct bpf_lsm_storage *bpf_inode(const struct inode *inode) +{ + if (unlikely(!inode->i_security)) + return NULL; + + return inode->i_security + bpf_lsm_blob_sizes.lbs_inode; +} which means extra kmalloc for every inode, extra pointer deref, global var access, and additional math just to get to 'inode_bpf_storage' pointer. We have similar pointer in 'struct sock' already: #ifdef CONFIG_BPF_SYSCALL struct bpf_sk_storage __rcu *sk_bpf_storage; #endif that is used by variety of networking bpf programs. The commit 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") has benchmarking data for it: hash table with 8-byte key -> 152ns per bpf run sk_bpf_storage -> 66ns per bpf run Hashtable suppose to be O(1) with L1$ hit, but it turned out to be noticeably slower than sk_bpf_storage. We expect to see similar performance gains for inode_bpf_storage vs hashtable approach that people use now. Eventually we'll add task_bpf_storage as well. Right now every other bpf tracing script is using pid as a key in a separate hash table to store per-task data. For high frequency events that adds up. task_bpf_storage will accelerate that. Another way to look at it is shared inode->i_security across different inodes won't work for us. We need something really cheap like single 'inode_bpf_storage' pointer that is zero most of the time and for few inodes bpf progs will keep their scratch data in there. For now lsm style bpf_inode() approach is ok-ish. But we will come back when we collect perf numbers to justify why direct pointer in the 'struct inode' is a win.