Re: [PATCH 3/4] selinux: prepare for inlining of hashtab functions

Ondrej Mosnacek <omosnace@xxxxxxxxxx> · Sat, 2 May 2020 11:11:43 +0200

On Fri, May 1, 2020 at 10:33 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> On Tue, Apr 28, 2020 at 8:55 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> >
> > Refactor searching and inserting into hashtabs to pave way for
> > converting hashtab_search() and hashtab_insert() to inline functions in
> > the next patch. This will avoid indirect calls and allow the compiler to
> > better optimize individual callers, leading to a drastic performance
> > improvement.
>
> This commit description describes the next patch in the series, and
> some of your motivation, but doesn't really tell me much about this
> patch other than it is a "refactoring".  I need more info here,
> especially considering my comment below.

Yes, the commit message needs some more love... See the clarification below.

>
> > Signed-off-by: Ondrej Mosnacek <omosnace@xxxxxxxxxx>
> > ---
> >  security/selinux/ss/conditional.c |   4 +-
> >  security/selinux/ss/conditional.h |   2 +-
> >  security/selinux/ss/hashtab.c     |  44 +++++-----
> >  security/selinux/ss/hashtab.h     |  22 ++---
> >  security/selinux/ss/mls.c         |  23 +++---
> >  security/selinux/ss/policydb.c    | 128 +++++++++++++++++++-----------
> >  security/selinux/ss/policydb.h    |   9 +++
> >  security/selinux/ss/services.c    |  38 ++++-----
> >  security/selinux/ss/symtab.c      |  22 ++++-
> >  security/selinux/ss/symtab.h      |   3 +
> >  10 files changed, 178 insertions(+), 117 deletions(-)
>
> ...
>
> > diff --git a/security/selinux/ss/hashtab.h b/security/selinux/ss/hashtab.h
> > index 31c11511fe10..4885234257d4 100644
> > --- a/security/selinux/ss/hashtab.h
> > +++ b/security/selinux/ss/hashtab.h
> > @@ -13,6 +13,12 @@
> >
> >  #define HASHTAB_MAX_NODES      0xffffffff
> >
> > +struct hashtab_key_params {
> > +       u32 (*hash)(const void *key);   /* hash function */
> > +       int (*cmp)(const void *key1, const void *key2);
> > +                                       /* key comparison function */
> > +};
> > +
> >  struct hashtab_node {
> >         void *key;
> >         void *datum;
> > @@ -23,10 +29,6 @@ struct hashtab {
> >         struct hashtab_node **htable;   /* hash table */
> >         u32 size;                       /* number of slots in hash table */
> >         u32 nel;                        /* number of elements in hash table */
> > -       u32 (*hash_value)(struct hashtab *h, const void *key);
> > -                                       /* hash function */
> > -       int (*keycmp)(struct hashtab *h, const void *key1, const void *key2);
> > -                                       /* key comparison function */
>
> I don't like how you've split the hashing and comparison functions out
> of the hashtab struct and into their own data structure with no
> explicit linkage between the two.  This is a bad design decision in my
> opinion, and something we should try to avoid.

In general, I would totally agree with you, but in this case this is
crucial to avoid the indirect calls. Granted, the commit message fails
to explain this and I need to rewrite it (and the callback separation
probably deserves a comment in the code as well).

The thing is, if you store the callbacks in the hashtab struct, then
any function that didn't also initialize that hashtab has to fetch the
callbacks from the struct and call them indirectly (via something like
"call *%rax" in the case of x86_64, although in practice it will be
something more weird due to Spectre mitigations...). In C, there is no
other way to keep the hashtab generic without forcing the indirect
calls.

Note that rhashtable (see <linux/rhashtable.h>) does exactly the same
thing. When I first saw it, I also thought "what a horrible
interface", until I realized it is necessary to avoid the costly
indirect calls.

I tried to encapsulate the callback structs as much as possible -
symtab has them hidden nicely behind the specialized symtab_insert()
and symtab_search() functions and the other hashtab instances are
encapsulated in policydb.c (inserts are done always in that file and
for lookups I added specialized functions). Although now I realize I
could go one step further and extract all the policydb hashtab-related
code (read, write, lookup, destroy for each type of hashtab except
symtab) into a separate compilation unit (or even each type into its
own?)... The policydb.c file is getting quite big and unwieldy, so I
think it would be a good thing even as a separate patch below the
rest.

Does this alleviate your concerns regarding the design (assuming I
expand the commit message and keep the code using the generic hashtab
functions hidden behind a more high-level interface as suggested
above)?

-- 
Ondrej Mosnacek <omosnace at redhat dot com>
Software Engineer, Security Technologies
Red Hat, Inc.