Re: getdents - ext4 vs btrfs performance

Phillip Susi <phillsusi@xxxxxxxxx> · Tue, 13 Mar 2012 16:22:52 -0400

On 3/13/2012 3:53 PM, Ted Ts'o wrote:
Because that would be a format change.

I think a format change would be preferable to runtime sorting.

What we have today is not a hash table; it's a hashed tree, where we
use a fixed-length key for the tree based on the hash of the file
name.  Currently the leaf nodes of the tree are the directory blocks
themselves; that is, the lowest level of the index blocks tells you to
look at directory block N, where that directory contains the directory
indexes for those file names which are in a particular range (say,
between 0x2325777A and 0x2325801).

So the index nodes contain the hash ranges for the leaf block, but the 
leaf block only contains the regular directory entries, not a hash for 
each name?  That would mean that adding or removing names would require 
moving around the regular directory entries wouldn't it?

If we aren't going to change the ordering of the directory directory,
that means we would need to change things so the leaf nodes contain
the actual directory file names themselves, so that we know whether or
not we've hit the correct entry or not before we go to read in a
specific directory block (otherwise, you'd have problems dealing with
hash collisions).  But in that case, instead of storing the pointer to
the directory entry, since the bulk of the size of a directory entry
is the filename itself, you might as well store the inode number in
the tree itself, and be done with it.

I would think that hash collisions are rare enough that reading a 
directory block you end up not needing once in a blue moon would be 
chalked up under "who cares".  So just stick with hash, offset pairs to 
map the hash to the normal directory entry.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html