On Mon, 13 Nov 2023 at 22:17, Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > > From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > > This is an inline patch version of a patch submitted by Linus in the > thread below. Do not merge this patch anywhere without additional > testing or a proper commit message. > > * https://lore.kernel.org/selinux/20231111160954.45911-2-paul@xxxxxxxxxxxxxx A couple of notes on this. The *point* of the patch basically boils down to this particular change: > +static inline struct hashtab_node **hashtab_entry(struct hashtab *h, > + const void *key, const struct hashtab_key_params key_params) > +{ > + return h->htable + hash_32(key_params.hash(key), h->hbits); > +} ... > - ... key_params.hash(key) & (h->size - 1); > + ... hashtab_entry(h, key, key_params); which basically means that now instead of just using the low bits of whatever hash value, it actually uses all bits of the 32-bit hash. It doesn't much matter as things stand now, because (a) I'm not convinced of how performance-critical this is (there is *very* performance-critical hashing in selinux, but it's mainly the AVC lookup which doesn't use the hashtab model at all) (b) a lot of the hashing intentionally tries to put the bits in low bits - exactly *because* this code used to just use the low bits. IOW, we have things like this: static u32 rangetr_hash(const void *k) ... return key->source_type + (key->target_type << 3) + (key->target_class << 5); which basically tries pretty hard to keep things in the low bits rather than spread things out too much or too well (c) the code hasn't ever done a good job before For example, with the old "filenametr_hash()" it used partial_name_hash(), but then never did the final end_name_hash(), so it was never even folded down to an u32, and the high bits of the hash on a 64-bit machine were dropped entirely even before it then only used the low bits thanks to that '& (h->size-1)' masking. Anyway, for this patch to really matter, the hashing itself should matter (questionable) and the existing hash functions like that rangetr_hash() and filenametr_hash() should be better than they are. Paul already sent out a patch to improve on filenametr_hash(), although it is the case that full_name_hash() is really meant for *counted* strings, and since filenametr_hash() has to do the 'strlen()' just to see how lonbg the string is, it's not exactly efficient there either. Oh well. We do have a "hash and count length a word at a time" function, but that one is for pathnames, and stops at '/' characters in addition to terminating NUL characters. Linus