Re: algorithm for half-md4 used in htree directories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, Ted, I will try yours and step through it to figure out what is off.

You ask a fair question: other than madness, why would someone want to
recreate the exact algorithm?

I have had a number of cases where I have needed to manipulate disks,
filesystems, partition tables, etc. from within a non-C-source
program. The options have been: fork/exec to some external program;
run a VM where I can mount the fs and manipulate content as needed.
Those both work, but have had issues in various environments.

I made the mistake of saying, "well, all of this is just manipulating
bytes on a disk image or block device; how hard could it be?" :-)
So understanding the algorithm actually becomes important.

I probably could link the library in if I am working on languages that
support it (go, rust), but not all do, and there are reasons that is
more difficult for the target use case.

I was long hoping to finish with go and move onto Rust by now, then
possibly some others, but this is not what I get paid for, so catch as
catch can.

Does that answer? Feel free to say, "madness" too; I won't disagree.
Avi

On Fri, Oct 15, 2021 at 12:10 PM Theodore Ts'o <tytso@xxxxxxx> wrote:
>
> On Fri, Oct 15, 2021 at 11:43:07AM -0700, Avi Deitcher wrote:
> > I am absolutely stumped. I tried the seed as four u32 as is on disk
> > (i.e. big-endian); four u32 little-endian; one long little-endian
> > array of bytes (I have no idea why that would make sense, but worth
> > trying); zeroed out so it gets the default. No one gives a consistent
> > solution.
> >
> > As far as I can tell: hash tells you which intermediate block to look
> > in, minor hash tells you which leaf block to look in, and then you
> > scan. So it is pretty easy to see in what range the minor and major
> > hash should be, but no luck.
> >
> > I put up a gist with debugfs and source and output.
> > https://gist.github.com/deitch/53b01a90635449e7674babfe7e7dd002
> >
> > Anyone who feels like a look-see, I would much appreciate it (and if
> > they figure it out, owe a beer if ever in the same city).
>
> I'm really curious *why* you are trying to reverse engineer the
> implementation.  What are you trying to do?
>
> In any case, you're mostly right about what hash and minor_hash are
> for.  The 32-bit hash value is only thing that we use in the hashed B+
> tree which is used for hash directories.  The 32-bit minor hash is
> used to form a 64-bit number that gets used when we need to support
> things like NFSv3 directory cursors, and POSIX telldir/seekdir (which
> is a massive headache for modern file system, since it assumes that a
> 64-bit "offset" is all you get to reliably provide the POSIX
> telldir/seekdir/readdir guarantees even when the directory is getting
> large number of directory entries added and deleted without limit
> between the telldir and the seekdir call).
>
> As far as what you are doing wrong, I'll point out that *this* program
> (attached below) provides the correct result.  Running this through a
> debugger and comparing it with your implrementation is left as an
> exercise for the reader --- but why do you want to try to make your
> own implementation, when you could just pull down e2fsprogs, compile
> it, and then *use* the provided ext2_fs library for whatever the heck
> you are trying to do?
>
>                                        - Ted
>
> #include <stdio.h>
> #include <stdlib.h>
>
> #include <et/com_err.h>
> #include <uuid/uuid.h>
> #include <ext2fs/ext2_fs.h>
> #include <ext2fs/ext2fs.h>
>
> int main(int argc, char **argv)
> {
>         uuid_t  buf;
>         unsigned int *p;
>         int i;
>         ext2_dirhash_t hash, minor_hash;
>         errcode_t retval;
>
>         uuid_parse("d64563bc-ea93-4aaf-a943-4657711ed153", buf);
>         p = (unsigned int *) buf;
>         for (i=0; i < 4; i++) {
>                 printf("buf[%d] = 0x%08x\n", i, p[i]);
>         }
>
>         retval = ext2fs_dirhash(1, "dir478", strlen("dir478"), p,
>                                 &hash, &minor_hash);
>         printf("dirhash results: retval=%u, hash=0x%08x, minor_hash=0x%08x\n",
>                i, hash, minor_hash);
>
>         exit(0);
> }
>
> % gcc -g -o /tmp/foo /tmp/foo.c -luuid -lext2fs -lcom_err
> % /tmp/foo
> buf[0] = 0xbc6345d6
> buf[1] = 0xaf4a93ea
> buf[2] = 0x574643a9
> buf[3] = 0x53d11e71
> dirhash results: retval=4, hash=0x012225e2, minor_hash=0x3f08755d



-- 
Avi Deitcher
avi@xxxxxxxxxxxx
Follow me http://twitter.com/avideitcher
Read me http://blog.atomicinc.com



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux