On Tue, Apr 25, 2017 at 09:22:16PM +0200, Richard Weinberger wrote: > Eric, > > Am 25.04.2017 um 19:46 schrieb Eric Biggers: > >> Sorry if this is a stupid question, but why do you have to compare hashes _and_ > >> the last few bytes of the bigname? > >> A lookup via bigname gives you two 32bits hash values, and there I'd assume that > >> this is sufficient for a collisions free lookup. Especially since an > >> resumed readdir() > >> with a 64bits cookie has to work too on your filesystem. > >> > > > > Well, the problem is that hashes may not be sufficient to uniquely identify a > > name in all cases. f2fs uses only a 32-bit hash so it's trivial to create > > collisions on it, as I demonstrated. Even collisions of two 32-bit hashes, as > > used by ext4 and ubifs, are possible. And ext4 currently doesn't even compare > > the hashes during directory searches, beyond using them to find the correct > > directory block, since the hashes aren't stored in the directory entries. > > I agree that finding a collision in a 32bits hash is easy, but for 64bits it > is *much* harder. That's true for accidental collisions, but malicious users might create intentional collisions. In the case of UBIFS it looks like the first 32 bits of the cookie depend solely only on the filename via key_r5_hash(), while the second 32 bits is random. So I imagine a collision in the full 64 bits could be generated by precomputing on average about 65536 filenames which collide in key_r5_hash(), then creating them all in the same directory. Eric