Re: [PATCH 1/3] selinux: use arrays for avtab hashtable nodes

Paul Moore <paul@xxxxxxxxxxxxxx> · Thu, 14 Sep 2023 21:10:26 -0400

On Thu, Sep 14, 2023 at 5:57 PM Jacob Satterfield
<jsatterfield.linux@xxxxxxxxx> wrote:
>
> On Tue, Sep 12, 2023 at 11:23 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> >
> > On Sep  6, 2023 Stephen Smalley <stephen.smalley.work@xxxxxxxxx> wrote:
> > >
> > > The current avtab hashtable employs a separate chaining collision
> > > resolution strategy where each bucket/chain holds an ordered linked list
> > > of pointers to kmem_cache allocated avtab_node elements.
> > >
> > > On Fedora 38 (x86_64) using the default policy, avtab_node_cachep
> > > uses 573 slabs each containing 170 objects totaling 2,337,840 bytes.
> > > A call to kmem_cache_zalloc() is required for every single rule, which
> > > in the default policy is currently 96,730 and continually rising.
> > >
> > > When both sets of avtab_node (regular and cond.) are turned into arrays
> > > with the hash table slot indexing into it rather than a pointer, then
> > > this results in only 2 additional kvcalloc() calls and the complete
> > > removal of the kmem_cache itself.
> > >
> > > The caching characteristics of iterating a single array are better due
> > > to locality of reference. Running "perf stat -r 100 -d load_policy"
> > > has shown a runtime reduction of at least 10% on a Fedora 38 x86_64 VM
> > > with this single patch. Future patches focused on improving the hash
> > > table's collision resolution strategy and array layout (struct-of-arrays
> > > vs. array-of-structs) may elicit even more caching and therefore runtime
> > > performance improvements.
> > >
> > > On 64-bit machines, the memory usage of the hash table slots is cut in
> > > half due to the use of u32 indices instead of memory pointers. Since
> > > ~65k hash slots are used between the regular and cond. tables with the
> > > default Fedora 38 policy, this equates to around 256KB of memory saved.
> > >
> > > Notes:
> > >
> > > A couple helper functions avtab_get_chain() and avtab_get_node() are
> > > introduced to provide more standardized interfaces for use in functions
> > > that need to search through the hash table.
> > >
> > > NULL_NODE_IDX defines a sentinel value which helps determinine if a spot
> > > in the hash table or the "next" member of an avtab_node are valid.
> > >
> > > This patch causes the cond. rules table to waste memory as the size
> > > requested for the kvcalloc() is the size of the regular rules table.
> > > These tables rarely, if ever, have the same number of rules in practice.
> > > The next patch addresses this shortcoming.
> >
> > Isn't the more important issue that of there being more conditional
> > rules than regular rules?  Yes, I'm sure this is very unlikely, but
> > given just this patch wouldn't this cause a problem?
> >
> > It is important to remember that even when combined in a patchset,
> > any given patch should not break the system.  A patch can depend on
> > prior patches in the patchset, but not upcoming patches.
> >
> > I've only looked very briefly at patch 2/3, but I suspect at the very
> > least you may need to squash patches 1/3 and 2/3 to ensure there is
> > no breakage.
> >
> > More comments below (all are in the context of patch 1/3, some may not
> > be relevant in the context of 1/3+2/3).
>
> I will squash the two in the next version.
>
> > > Signed-off-by: Jacob Satterfield <jsatterfield.linux@xxxxxxxxx>
> > > Reviewed-by: Stephen Smalley <stephen.smalley.work@xxxxxxxxx>
> > > ---
> > >  security/selinux/ss/avtab.c       | 75 +++++++++++++++----------------
> > >  security/selinux/ss/avtab.h       | 28 ++++++++++--
> > >  security/selinux/ss/conditional.c | 13 +++---
> > >  security/selinux/ss/services.c    | 20 +++++----
> > >  4 files changed, 81 insertions(+), 55 deletions(-)

...
                                              0, SLAB_PANIC, NULL);
> > > diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
> > > index 3c3904bf02b0..72708aed5fff 100644
> > > --- a/security/selinux/ss/avtab.h
> > > +++ b/security/selinux/ss/avtab.h
> > > @@ -77,16 +77,38 @@ struct avtab_datum {
> > >  struct avtab_node {
> > >       struct avtab_key key;
> > >       struct avtab_datum datum;
> > > -     struct avtab_node *next;
> > > +     u32 next;
> >
> > Given that one of the common avtab lookup operations is to simply walk
> > the list, using the next field, why not keep the next field as a
> > pointer?  Sure, you loose the 64-bit to 32-bit size reduction on 64-bit
> > systems (although I wonder if alignment issues rendered that moot), but
> > I would expect the walk to be quicker if you only needed to do a single
> > next pointer lookup as opposed to the array indexing.
> >
> > I may be missing something, but I believe this would also remove the
> > need for the NULL_NODE_IDX sentinel.
> >
>
> The conditional avtab is the reason an index was used instead of a
> pointer. Because it doesn't have the right hint when allocating the
> nodes array, we have to copy the nodes array into a smaller buffer
> after reading the rules (otherwise we waste about 1.7 MB of memory).
> The copy invalidates the pointers that each "next" refers to as they
> would be pointing to free'd memory instead of the new nodes array.
> However, I agree that the walk would be quicker with the direct
> pointer reference instead of the array index and with the way the
> struct alignment works, the index here doesn't save memory.
>
> An alternative I have prototyped is to use pointer math to calculate
> an offset from the original nodes array to where "next" is pointing to
> and add it from the new array to calculate the new "next" value. This
> changes "next" back to a pointer and gets rid of the indirection.
>
> I agree that the NULL_NODE_IDX sentinel is not very ergonomic, however
> it's at the cost of 256 KB of memory in relation to the htable slots.
> Mainly due to the difficulty of telling when a slot is empty with an
> index vs. a pointer. I could leave the htable slots as indexes, but it
> would still require some form of sentinel value to indicate the
> emptiness of a slot. What would you prefer?

In addition to other changes we've discussed, it looks like squashing
1/3 and 2/3 is going to change things quite a bit, so I'm going to
suggest you make the changes you're planning to make, including
squashing the two patches, and repost.  We can re-review the new
revision and pick up the discussion from there.

Does that sound okay?

-- 
paul-moore.com