Re: [PATCH v2] selinux: reduce the use of hard-coded hash sizes

Paul Moore <paul@xxxxxxxxxxxxxx> · Sun, 23 Feb 2020 10:01:53 -0500

On Sun, Feb 23, 2020 at 9:46 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> On Sat, Feb 22, 2020 at 8:38 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> > On Wed, Feb 19, 2020 at 4:33 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> > >
> > > Instead allocate hash tables with just the right size based on the
> > > actual number of elements (which is almost always known beforehand, we
> > > just need to defer the hashtab allocation to the right time). The only
> > > case when we don't know the size (with the current policy format) is the
> > > new filename transitions hashtable. Here I just left the existing value.
> > >
> > > After this patch, the time to load Fedora policy on x86_64 decreases
> > > from 950 ms to 220 ms. If the unconfined module is removed, it decreases
> > > from 870 ms to 170 ms. It is also likely that other operations are going
> > > to be faster, mainly string_to_context_struct() or mls_compute_sid(),
> > > but I didn't try to quantify that.
> > >
> > > The memory usage increases a bit after this patch, but only by ~1-2 MB
> > > (it is hard to measure precisely). I believe it is a small price to pay
> > > for the increased performance.
> > >
> > > Signed-off-by: Ondrej Mosnacek <omosnace@xxxxxxxxxx>
> > > ---
> > >
> > > Changed in v2:
> > >  - guard against h->size == 0 in hashtab_search() and hashtab_insert()
> > >
> > >  security/selinux/ss/hashtab.c  | 25 +++++++++++++---
> > >  security/selinux/ss/hashtab.h  |  2 +-
> > >  security/selinux/ss/policydb.c | 53 +++++++++++++---------------------
> > >  security/selinux/ss/policydb.h |  2 --
> > >  4 files changed, 42 insertions(+), 40 deletions(-)
> > >
> > > diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
> > > index ebfdaa31ee32..87ad83148cbd 100644
> > > --- a/security/selinux/ss/hashtab.c
> > > +++ b/security/selinux/ss/hashtab.c
> > > @@ -12,12 +12,26 @@
> > >
> > >  static struct kmem_cache *hashtab_node_cachep;
> > >
> > > +static u32 hashtab_compute_size(u32 nel)
> > > +{
> > > +       u32 size;
> > > +
> > > +       if (nel <= 2)
> > > +               return nel;
> > > +
> > > +       /* use the nearest power of two to balance size and access time */
> > > +       size = roundup_pow_of_two(nel);
> > > +       if (size - nel > size / 4)
> > > +               size /= 2;
> >
> > It would be nice if the commit description (and possibly the code
> > itself via a shorter version in the comments) gave some insight into
> > why you chose this particular adjustment to the hash table size.  Was
> > this based on observations with real world policies?  Just a gut
> > feeling?  Things like this are the sort of thing that one wonders
> > about five years later when modifying the code and by then no one can
> > remember if it is important or not.
> >
> > Also, considering the adjustment above, why not use
> > rounddown_pow_of_two() instead (perhaps coupled with a minimum size
> > check)?
>
> Good point. I think I was tuning this formula back when I haven't
> rebased this patch on top of the filename transition rework, so I
> suspect that the filename_trans hash table was what was causing high
> memory usage with plain roundup_pow_of_two()... With ~225k rules it
> should be taking up ~2 MB in such case, which is quite a lot. (In
> fact, even with the adjusted formula it would still allocate the same
> table, so my crude method of measuring the memory usage apparently
> wasn't reliable...) Now that this hash table is only at ~2500 elements
> even with our imperfect policy, it should only take up ~32 KB when
> rounding up, which is a pretty small overhead.
>
> I'll re-test the patch with hashtab_compute_size() ~
> roundup_pow_of_two(), this time logging the exact amount of bytes
> taken up by the hash table arrays (which is the only thing that is
> changing here). I expect it will show only a few 10s of KB difference,
> in which case I'll respin the patch with this simpler (and likely a
> bit faster) variant.

Thanks.  I also want to stress that capturing the logic behind the
decision in the patch description / comments is almost as important
(maybe more so) than what we end up deciding on for the actual size
algorithm.  I can almost guarantee the decision made now is going to
be brought up for discussion at some point in the future, so let's
help the poor future devs as much as we can ;)

-- 
paul moore
www.paul-moore.com