Re: [GIT PULL] SELinux patches for v5.8

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 3 Jun 2020 14:02:02 -0700




On Wed, Jun 03, 2020 at 10:37:47AM -0700, Linus Torvalds wrote:
> On Wed, Jun 3, 2020 at 10:20 AM Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote:
> >
> > We could have inode->i_security be the blob, rather than a pointer to it.
> > That will have its own performance issues.
> 
> It wouldn't actually really fix anything, because the inode is so big
> and sparsely accessed that it doesn't even really help the cache
> density issue. Yeah, it gets rid of the pointer access, but that's
> pretty much it. The fact that we randomize the order means that we
> can't even really try to aim for any cache density.
> 
> And it would actually not be possible with the current layered
> security model anyway, since those blob sizes are dynamic at runtime.
> 
> If we had _only_ SELinux, we could perhaps have hidden the
> sid/sclass/task_sid directly in the inode (it would be only slightly
> larger than the pointer is, anyway), but even that ship sailed long
> long ago due to the whole "no security person can ever agree with
> another one on fundamentals".

Also there is bpf_lsm now that we're going to run it in production, 
so performance is as important as ever.
Traditional lsm-s have per-lsm per-inode blob.
For bpf that doesn't work, since progs come and go at run-time and
independent from each other.
So we need per-program per-inode blob.
To maintain good performance we've proposed:
@@ -740,6 +741,10 @@  struct inode {
 	struct fsverity_info	*i_verity_info;
 #endif
 
+#ifdef CONFIG_BPF_SYSCALL
+	struct bpf_local_storage __rcu	*inode_bpf_storage;
+#endif

https://patchwork.ozlabs.org/project/netdev/patch/20200526163336.63653-3-kpsingh@xxxxxxxxxxxx/

but got pushback, so we're going to use lsm style for now:
+static inline struct bpf_lsm_storage *bpf_inode(const struct inode *inode)
+{
+       if (unlikely(!inode->i_security))
+               return NULL;
+
+       return inode->i_security + bpf_lsm_blob_sizes.lbs_inode;
+}

which means extra kmalloc for every inode, extra pointer deref, global var
access, and additional math just to get to 'inode_bpf_storage' pointer.

We have similar pointer in 'struct sock' already:
#ifdef CONFIG_BPF_SYSCALL
        struct bpf_sk_storage __rcu     *sk_bpf_storage;
#endif
that is used by variety of networking bpf programs.
The commit 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
has benchmarking data for it:
  hash table with 8-byte key -> 152ns per bpf run 
  sk_bpf_storage -> 66ns per bpf run
Hashtable suppose to be O(1) with L1$ hit, but it turned out
to be noticeably slower than sk_bpf_storage.
We expect to see similar performance gains for inode_bpf_storage
vs hashtable approach that people use now.
Eventually we'll add task_bpf_storage as well.
Right now every other bpf tracing script is using pid as a key
in a separate hash table to store per-task data. For high frequency
events that adds up. task_bpf_storage will accelerate that.

Another way to look at it is shared inode->i_security across
different inodes won't work for us. We need something really
cheap like single 'inode_bpf_storage' pointer that is zero
most of the time and for few inodes bpf progs will keep their
scratch data in there.
For now lsm style bpf_inode() approach is ok-ish.
But we will come back when we collect perf numbers to justify
why direct pointer in the 'struct inode' is a win.