> On Oct 11, 2022, at 7:37 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Tue, 11 Oct 2022, Chuck Lever III wrote: >>> On Oct 10, 2022, at 8:16 PM, NeilBrown <neilb@xxxxxxx> wrote: >>> >>> On Fri, 07 Oct 2022, Chuck Lever wrote: >>>> >>>> -static unsigned int file_hashval(struct svc_fh *fh) >>>> +/* >>>> + * The returned hash value is based solely on the address of an in-code >>>> + * inode, a pointer to a slab-allocated object. The entropy in such a >>>> + * pointer is concentrated in its middle bits. >>> >>> I think you need more justification than that for throwing away some of >>> the entropy, even if you don't think it is much. >> >> We might have that justification: >> >> https://lore.kernel.org/linux-nfs/YrUFbLJ5uVbWtZbf@ZenIV/ >> >> Actually I believe we are not discarding /any/ entropy in >> this function. The bits we discard are invariant. >> > > Ok, I can accept that this: > > + k = ptr >> L1_CACHE_SHIFT; > I searched for ">> *L1_CACHE_SHIFT". Apart from the nfsd > filecache you mentioned I find two. One in quota and one in reiserfs. > Both work with traditional hash tables which are more forgiving of > longer chains. > Do you have other evidence of this being a common trope? This approach is based on the hash function in fs/inode.c, which uses integer division instead of a shift. 509 static unsigned long hash(struct super_block *sb, unsigned long hashval) 510 { 511 unsigned long tmp; 512 513 tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) / 514 L1_CACHE_BYTES; 515 tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> i_hash_shift); 516 return tmp & i_hash_mask; 517 } > only discards invariant bits, but how can you justify this: > > + k &= 0x00ffffff; > > ?? After shifting an address, the top byte generally contains invariant bits as well. > And given that you pass it all to jhash anyway, why not just pass all of > it? I don't think this is a big deal, but these functions are basically the same as what was recently merged without complaint. It's not a high priority to revisit those. It might be worth a clean-up to share this hash function between the two hash tables... at that point we might consider removing the extra mask. >> And, note that this is exactly the same situation we just merged >> in the filecache overhaul, and is a common trope amongst other >> hash tables that base their function on the inode's address. >> >> >>> Presumably you think hashing 32 bits is faster than hashing 64 bits. >>> Maybe it is, but is it worth it? >>> >>> rhashtable depends heavily on having a strong hash function. In >>> particular if any bucket ends up with more than 16 elements it will >>> choose a new seed and rehash. If you deliberately remove some bits that >>> it might have been used to spread those 16 out, then you are asking for >>> trouble. >>> >>> We know that two different file handles can refer to the same inode >>> ("rarely"), and you deliberately place them in the same hash bucket. >>> So if an attacker arranges to access 17 files with the same inode but >>> different file handles, then the hashtable will be continuously >>> rehashed. >>> >>> The preferred approach when you require things to share a hash chain is >>> to use an rhl table. >> >> Again, this is the same situation for the filecache. Do you >> believe it is worth reworking that? I'm guessing "yes". > > As a matter of principle: yes. > rhashtable is designed to assume that hash collisions are bad and can be > changed by choosing a different seed. > rhl_tables was explicitly added for cases when people wanted multiple > elements to hash to the same value. > > The chance of it causing a problem without an attack are admittedly > tiny. Attacks are only possible with subtree_check enabled, or if the > underlying filesystem does something "clever" with file handles, so > there wouldn't be many situations where an attack would even be > possible. But if it were possible, it would likely be easy. > The cost of the attack would be a minor-to-modest performance impact. > > So it is hard to argue "this code is dangerous and must be changed", but > we have different tools for a reason, and I believe that rhl-tables is > the right tool for this job. Agreed. I wasn't suggesting it's an emergency situation, but it's something that should get fixed at some point if there's a problem with it, even a minor one. I think I stopped at the non-list variant of rhashtable because using rhl was more complex, and the non-list variant seemed to work fine. There's no architectural reason either file_hashtbl or the filecache must use the non-list variant. In any event, it's worth taking the trouble now to change the nfs4_file implementation proposed here as you suggest. -- Chuck Lever