Re: [RFC PATCH] selinux: assorted hash table improvements

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 14 Nov 2023 09:38:44 -0800

On Mon, 13 Nov 2023 at 22:17, Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
>
> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>
> This is an inline patch version of a patch submitted by Linus in the
> thread below.  Do not merge this patch anywhere without additional
> testing or a proper commit message.
>
> * https://lore.kernel.org/selinux/20231111160954.45911-2-paul@xxxxxxxxxxxxxx

A couple of notes on this.

The *point* of the patch basically boils down to this particular change:

> +static inline struct hashtab_node **hashtab_entry(struct hashtab *h,
> +       const void *key, const struct hashtab_key_params key_params)
> +{
> +       return h->htable + hash_32(key_params.hash(key), h->hbits);
> +}
...
> -        ... key_params.hash(key) & (h->size - 1);
> +       ... hashtab_entry(h, key, key_params);

which basically means that now instead of just using the low bits of
whatever hash value, it actually uses all bits of the 32-bit hash.

It doesn't much matter as things stand now, because

 (a) I'm not convinced of how performance-critical this is (there is
*very* performance-critical hashing in selinux, but it's mainly the
AVC lookup which doesn't use the hashtab model at all)

 (b) a lot of the hashing intentionally tries to put the bits in low
bits - exactly *because* this code used to just use the low bits. IOW,
we have things like this:

    static u32 rangetr_hash(const void *k)
    ...
        return key->source_type + (key->target_type << 3) +
                (key->target_class << 5);

which basically tries pretty hard to keep things in the low bits
rather than spread things out too much or too well

 (c) the code hasn't ever done a good job before

For example, with the old "filenametr_hash()" it used
partial_name_hash(), but then never did the final end_name_hash(), so
it was never even folded down to an u32, and the high bits of the hash
on a 64-bit machine were dropped entirely even before it then only
used the low bits thanks to that '& (h->size-1)' masking.

Anyway, for this patch to really matter, the hashing itself should
matter (questionable) and the existing hash functions like that
rangetr_hash() and filenametr_hash() should be better than they are.

Paul already sent out a patch to improve on filenametr_hash(),
although it is the case that full_name_hash() is really meant for
*counted* strings, and since filenametr_hash() has to do the
'strlen()' just to see how lonbg the string is, it's not exactly
efficient there either.

Oh well. We do have a "hash and count length a word at a time"
function, but that one is for pathnames, and stops at '/' characters
in addition to terminating NUL characters.

                Linus