Does someone know how this is constructed and used? On Mon, Oct 4, 2021 at 12:57 AM Avi Deitcher <avi@xxxxxxxxxxxx> wrote: > > Hi Andreas, > > I had looked in __ext4fs_dirhash(). Yes, it does reference the seed - > and create a default if none is there at the filesystem level - but it > doesn't appear to use it, in that function. hinfo is populated in the > function - hash, minor-hash, seed - but it never uses the seed to > manipulate the hash. > > Are you saying that it is at a higher level? i.e. __ext4fs_dirhash() > is the *first* step, and there is further processing to get the actual > hash? I did walk up the stack, but couldn't figure out. > > Thanks for stepping in > Avi > > On Sun, Oct 3, 2021 at 7:43 PM Andreas Dilger <adilger@xxxxxxxxx> wrote: > > > > On Oct 3, 2021, at 06:47, Avi Deitcher <avi@xxxxxxxxxxxx> wrote: > > > > > > I can narrow down the question further. In my live sample, one of the > > > entries in the tree is for a directory named "dir155". > > > > > > If I run "dx_hash dir155", I get: > > > > > > # debugfs -R "dx_hash dir155" /var/lib/file.img > > > debugfs 1.46.2 (28-Feb-2021) > > > Hash of dir155 is 0x16279534 (minor 0x0) > > > > > > If I look in the tree with "htree_dump", I get: > > > > > > # debugfs -R "htree_dump /testdir" /var/lib/file.img > > > debugfs 1.46.2 (28-Feb-2021) > > > .... > > > Entry #0: Hash 0x00000000, block 1 > > > Reading directory block 1, phys 6459 > > > 168 0x00d11d98-b9b6b16b (16) dir155 332 0x009edafe-77de7d72 (16) dir319 > > > > > > That hash for dir155 does not match what dx_hash gave. If I try to > > > take the code from fs/ext4/hash.c and build a small program to > > > calculate the hash, I get: > > > > > > $ ./md4 dir155 > > > MD4: d90278a1 25182ac7 a02e56be c3f30f04 > > > hash: 0x25182ac6 > > > minor: 0xa02e56be > > > > > > Clearly that isn't what is in the tree. What basic am I missing? > > > > One important factor is that the directory hash has an initial seed > > to prevent pathological cases where the user can construct thousands > > of directory entries that have a hash collision. > > > > Looking at the code explains this in the comment for __ext4fs_dirhash(). > > The seed itself comes from sbi->s_hash_seed and is stored in the > > per-directory hinfo.seed to be used when counting the filename hash. > > In theory there could be a per-directory hash, but it appears to be a > > constant for the whole filesystem. > > > > Cheers, Andreas > > > > > > > >> On Fri, Oct 1, 2021 at 2:49 PM Avi Deitcher <avi@xxxxxxxxxxxx> wrote: > > >> > > >> Hi, > > >> > > >> I have been trying to understand the algorithm used for the "half-md4" > > >> in htree-structured directories. Going through the code (and trying > > >> not to get into reverse engineering), it looks like it is part of md4 > > >> but not entirely? Yet any subset I take doesn't quite line up with > > >> what I see in an actual sample. > > >> > > >> What is the algorithm it is using to turn an entry of, e.g., "file125" > > >> into the appropriate hash. I did run a live sample, and try to get > > >> some form of correlation between the actual md4 hash (16 bytes) of the > > >> above to the actual entry (4 bytes) shown by debugfs, without much > > >> luck. > > >> > > >> What basic thing am I missing? > > >> > > >> Separately, how does the seed play into it? > > >> > > >> Thanks > > >> Avi > > > > > > > > > > > > -- > > > Avi Deitcher > > > avi@xxxxxxxxxxxx > > > Follow me http://twitter.com/avideitcher > > > Read me http://blog.atomicinc.com > > > > -- > Avi Deitcher > avi@xxxxxxxxxxxx > Follow me http://twitter.com/avideitcher > Read me http://blog.atomicinc.com -- Avi Deitcher avi@xxxxxxxxxxxx Follow me http://twitter.com/avideitcher Read me http://blog.atomicinc.com