Re: [PATCH v2 7/9] NFSD: Use rhashtable for managing nfs4_file objects

"NeilBrown" <neilb@xxxxxxx> · Wed, 12 Oct 2022 10:37:24 +1100

On Tue, 11 Oct 2022, Chuck Lever III wrote:
> > On Oct 10, 2022, at 8:16 PM, NeilBrown <neilb@xxxxxxx> wrote:
> > 
> > On Fri, 07 Oct 2022, Chuck Lever wrote:
> >> 
> >> -static unsigned int file_hashval(struct svc_fh *fh)
> >> +/*
> >> + * The returned hash value is based solely on the address of an in-code
> >> + * inode, a pointer to a slab-allocated object. The entropy in such a
> >> + * pointer is concentrated in its middle bits.
> > 
> > I think you need more justification than that for throwing away some of
> > the entropy, even if you don't think it is much.
> 
> We might have that justification:
> 
> https://lore.kernel.org/linux-nfs/YrUFbLJ5uVbWtZbf@ZenIV/
> 
> Actually I believe we are not discarding /any/ entropy in
> this function. The bits we discard are invariant.
> 

Ok, I can accept that this:

+	k = ptr >> L1_CACHE_SHIFT;

only discards invariant bits, but how can you justify this:

+	k &= 0x00ffffff;

??

And given that you pass it all to jhash anyway, why not just pass all of
it?

> And, note that this is exactly the same situation we just merged
> in the filecache overhaul, and is a common trope amongst other
> hash tables that base their function on the inode's address.
> 
> 
> > Presumably you think hashing 32 bits is faster than hashing 64 bits.
> > Maybe it is, but is it worth it?
> > 
> > rhashtable depends heavily on having a strong hash function.  In
> > particular if any bucket ends up with more than 16 elements it will
> > choose a new seed and rehash.  If you deliberately remove some bits that
> > it might have been used to spread those 16 out, then you are asking for
> > trouble.
> > 
> > We know that two different file handles can refer to the same inode
> > ("rarely"), and you deliberately place them in the same hash bucket.
> > So if an attacker arranges to access 17 files with the same inode but
> > different file handles, then the hashtable will be continuously
> > rehashed.
> > 
> > The preferred approach when you require things to share a hash chain is
> > to use an rhl table.
> 
> Again, this is the same situation for the filecache. Do you
> believe it is worth reworking that? I'm guessing "yes".

As a matter of principle: yes.
rhashtable is designed to assume that hash collisions are bad and can be
changed by choosing a different seed.
rhl_tables was explicitly added for cases when people wanted multiple
elements to hash to the same value.

The chance of it causing a problem without an attack are admittedly
tiny.  Attacks are only possible with subtree_check enabled, or if the
underlying filesystem does something "clever" with file handles, so
there wouldn't be many situations where an attack would even be
possible.  But if it were possible, it would likely be easy.
The cost of the attack would be a minor-to-modest performance impact.

So it is hard to argue "this code is dangerous and must be changed", but
we have different tools for a reason, and I believe that rhl-tables is
the right tool for this job.

> 
> 
> > This allows multiple instances with the same key.
> > You would then key the rhl-table with the inode, and search a
> > linked-list to find the entry with the desired file handle.  This would
> > be no worse in search time than the current code for aliased inodes, but
> > less susceptible to attack.
> > 
> >> +/**
> >> + * nfs4_file_obj_cmpfn - Match a cache item against search criteria
> >> + * @arg: search criteria
> >> + * @ptr: cache item to check
> >> + *
> >> + * Return values:
> >> + *   %0 - Item matches search criteria
> >> + *   %1 - Item does not match search criteria
> > 
> > I *much* prefer %-ESRCH for "does not match search criteria".  It is
> > self-documenting.  Any non-zero value will do.
> 
> Noted, but returning 1 appears to be the typical arrangement for
> existing obj_cmpfn methods in most other areas.

Some people have no imagination :-)

[Same argument could be used against implementing kernel modules in Rust
 - Using C appears to be the typical arrangement for existing modules)

Thanks,
NeilBrown