On Mon, May 2, 2016 at 1:26 PM, George Spelvin <linux@xxxxxxxxxxx> wrote: > > But the hash_long it not odd at all. They're using it as the mix function > between adding words. Like hash_name() uses "hash = (hash + a) * 9". Right. But there is no reason to think that that should be the same thing as the final hash. In fact, there are many reasons to think it should *not*. The final hash is different. It's duifferent not just because it wants to concentrate the bits at the highb end (which the middle hash does not), but it's different exactly because the whole calling convention is different: the final hash returns a "small" value (in your version an "u32"), while the hash in the middle very much does not. So thinking that they are somehow related is wrong. It screws up hash.h, and incorrectly conflates this middle hash_mix() thing with the final has_32/64() thing. > It's a quite reasonable mix function. One multiply (4 cycles or so) per > 8 bytes. It's definitely swamped by the byte-at-a-time string loading. .. but it's not. Exactly because of the above. Make it be a "hash_mix()" function. Make it use the multiply, by all means. Same multiplier, even. BUT IT IS STILL NOT THE SAME FUNCTION, for the above reason. One wants to return "u32", the other does not. Really. > But... the fundamental reason I didn't was that this is late -rc. > I'm trying to fix a *bug* tglx found, not add risk and delay with a much > more ambitious patch series. Oh, this late in the rc we're not doing _any_ of this. I sent out my suggested "late in the rc and for stable" patch that fixes the practical problem we have, that has nothing to do with cleaning things up. > It *is* the FNV hash! More specifically, FNV-1a done a word at a time; > doing it byte at a time like most implementations would result in 8 > times as many multiplies and be slower. I refuse to call that shit "word at a time". It's "byte at a time with a slow conditional that will screw up your branch predictor and a multiply in the middle". A compiler might be able to turn it into some disgusting unrolled thing that avoids some of the problems, but at no point is that a good thing. I seriously think that it would be (a) more readable (b) have a better mix function if it was just kept as a byte-at-a-time thing entirely with the per-byte mixing thing done better than just "shift by 8 bits". And then at the end you could do a single hash_long(). That horrible svc thing needs to go. It's alll pretty moot, since we have a reasonable version that actually does do work-at-a-time. That can be improved too, I'm sure, but that svcauth garbage should just be thrown out. > The only difference is the exact multiplier. The published FNV-1 > uses a low-bit-weight multiplier. Since this is being done a word > at a time, I think the stronger multiplier is worth it. .. Considering that the "word-at-a-time" is likely *slower* than doing it a byte-at-a-time the way it has been encoded, I'm not in the least convinced. > For example, partial_name_hash() is still used in many places > even if the word-at-a-time code is used in namei.c. Those places aren't actually all that performance-critical. They really don't matter. > Okay, let me list the problems... > > 1) hash_name() stops at a slash or a nul. hash_str() only looks > for a nul. Should I add a third variant? Should that go in fs/namei, > or should the whole pole be moved elsewhere? We'll never get rid of "hash_name()", it not only has that '/' case, it's also inlined for a reason. You'd copy it without the magic for '/' and turn that into str_hash() for others to use. full_name_hash() can presumably be used pretty much as-is as mem_hash(). > 2) Some places need the length *and* the hash. Calling strlen() and then > full_name_hash() somewhat defeats the purpose of word-at-a-time access. > hash_name returns both jammed into a 64-bit word. Is that a good > interface in general? Hmm. We actually have that exact case in the dentry code and hash_name(), except we handle it by doing that "hashlen" thing that contains both the length and the hash in one 64-bit thing. Maybe we could expose that kind of interface, even if it's pretty ugly. > Myself, I think the length should be computed at copy_from_user() > time and I'd like to look at each such call site and understand why > it *doesn't* have the length ready. But that's a lot of work. If it's copied from user space, we already do have the length. You do realize that pathnames are different from pretty much every other case in the kernel? There's a reason pathnames have their own logic. The string length is very different from the component length, and it turns out that component length and hash is generally critical. > 3) They do particularly crappy mixing. See that "RFC PATCH 4/2" I posted > that because I couldn't stand how bad it was. > > If you don't have word at a time, the partial_name_hash() is decent, > but the word-at-a-time mixes by multiplying by 9. So the hashes > of the strings "1.......0" and "0.......9" are identical. > > (I assume this was chosen as the best available one-instruction (LEA) > mix operation due to an obsessive interest in speed in the dcache.) The thing is, a lot of people tend to optimize performance (and behavior) for large strings. For path components, the most common lengths are less than a single 8-byte word! That "mixing" function almost doesn't matter, because the case that matters the most (by far) are strings that fit in one or _maybe_ two words. Yes, things are slowly moving towards longer pathnames, but even with long filenames, most component names are the directory entries that still tend to be shorter. We still have the old 3-letter ones close to the root, of course, but the statistics are still pretty heavily skewed to <15 characters especially if you look at directory names. So for pathnames, the mixing in the middle has tended to not be _nearly_ as important as the final mix, for the simple reason that it happens maybe once, often not at all. > More crappy mixing is the final folding. On a 64-bit machine, the > high and low 32 bits are just XORed together. So the hashes of > "deadbeef" and "beefdead" are identical. Hmm? It uses "fold_hash()", which definitely doesn't do that. Are you still looking at partial_name_hash() and friends? Don't. They are garbage. They are used for things like case-insensitive filesystems etc. > (If you have a particular cycle count budget in mind, I can come up with > something.) The real issue I had in fs/namei.c is that link_path_walk() and __d_lookup_rcu() are literally _the_ hottest functions in the kernel under a few (common) loads, and for code generation things like register pressure ends up mattering. The reason that "hash_len" is a single 64-bit field rather than two 32-bit fields, for example, is that that way it takes on _register_ when we do the has lookup. Some of that code was tuned to inline - and _not_ inline in particular patterns. Now, some of that tuning may have bitrotted, of course, so I'm not saying it's necessarily doing great now. But some of that code was tuned to not need a stack frame etc. > 4) They don't do a final mix. Again, obsessive interest in speed when > you're storing the whole 32 bits and comparing that. For applications > that use only a few bits of hash index and need the entropy > "concentrated", should this be done outside? I think the caller should do it, yes. There's a difference between "calculate a hash for a string" and "turn that hash into a hashtable lookup". Linus -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html