Thanks, Ted, I will try yours and step through it to figure out what is off. You ask a fair question: other than madness, why would someone want to recreate the exact algorithm? I have had a number of cases where I have needed to manipulate disks, filesystems, partition tables, etc. from within a non-C-source program. The options have been: fork/exec to some external program; run a VM where I can mount the fs and manipulate content as needed. Those both work, but have had issues in various environments. I made the mistake of saying, "well, all of this is just manipulating bytes on a disk image or block device; how hard could it be?" :-) So understanding the algorithm actually becomes important. I probably could link the library in if I am working on languages that support it (go, rust), but not all do, and there are reasons that is more difficult for the target use case. I was long hoping to finish with go and move onto Rust by now, then possibly some others, but this is not what I get paid for, so catch as catch can. Does that answer? Feel free to say, "madness" too; I won't disagree. Avi On Fri, Oct 15, 2021 at 12:10 PM Theodore Ts'o <tytso@xxxxxxx> wrote: > > On Fri, Oct 15, 2021 at 11:43:07AM -0700, Avi Deitcher wrote: > > I am absolutely stumped. I tried the seed as four u32 as is on disk > > (i.e. big-endian); four u32 little-endian; one long little-endian > > array of bytes (I have no idea why that would make sense, but worth > > trying); zeroed out so it gets the default. No one gives a consistent > > solution. > > > > As far as I can tell: hash tells you which intermediate block to look > > in, minor hash tells you which leaf block to look in, and then you > > scan. So it is pretty easy to see in what range the minor and major > > hash should be, but no luck. > > > > I put up a gist with debugfs and source and output. > > https://gist.github.com/deitch/53b01a90635449e7674babfe7e7dd002 > > > > Anyone who feels like a look-see, I would much appreciate it (and if > > they figure it out, owe a beer if ever in the same city). > > I'm really curious *why* you are trying to reverse engineer the > implementation. What are you trying to do? > > In any case, you're mostly right about what hash and minor_hash are > for. The 32-bit hash value is only thing that we use in the hashed B+ > tree which is used for hash directories. The 32-bit minor hash is > used to form a 64-bit number that gets used when we need to support > things like NFSv3 directory cursors, and POSIX telldir/seekdir (which > is a massive headache for modern file system, since it assumes that a > 64-bit "offset" is all you get to reliably provide the POSIX > telldir/seekdir/readdir guarantees even when the directory is getting > large number of directory entries added and deleted without limit > between the telldir and the seekdir call). > > As far as what you are doing wrong, I'll point out that *this* program > (attached below) provides the correct result. Running this through a > debugger and comparing it with your implrementation is left as an > exercise for the reader --- but why do you want to try to make your > own implementation, when you could just pull down e2fsprogs, compile > it, and then *use* the provided ext2_fs library for whatever the heck > you are trying to do? > > - Ted > > #include <stdio.h> > #include <stdlib.h> > > #include <et/com_err.h> > #include <uuid/uuid.h> > #include <ext2fs/ext2_fs.h> > #include <ext2fs/ext2fs.h> > > int main(int argc, char **argv) > { > uuid_t buf; > unsigned int *p; > int i; > ext2_dirhash_t hash, minor_hash; > errcode_t retval; > > uuid_parse("d64563bc-ea93-4aaf-a943-4657711ed153", buf); > p = (unsigned int *) buf; > for (i=0; i < 4; i++) { > printf("buf[%d] = 0x%08x\n", i, p[i]); > } > > retval = ext2fs_dirhash(1, "dir478", strlen("dir478"), p, > &hash, &minor_hash); > printf("dirhash results: retval=%u, hash=0x%08x, minor_hash=0x%08x\n", > i, hash, minor_hash); > > exit(0); > } > > % gcc -g -o /tmp/foo /tmp/foo.c -luuid -lext2fs -lcom_err > % /tmp/foo > buf[0] = 0xbc6345d6 > buf[1] = 0xaf4a93ea > buf[2] = 0x574643a9 > buf[3] = 0x53d11e71 > dirhash results: retval=4, hash=0x012225e2, minor_hash=0x3f08755d -- Avi Deitcher avi@xxxxxxxxxxxx Follow me http://twitter.com/avideitcher Read me http://blog.atomicinc.com