Re: Question on readdir implementation

Andreas Dilger <adilger@xxxxxxx> · Tue, 15 Sep 2009 08:41:41 -0600

On Sep 15, 2009  17:57 +0800, Zhang Huan wrote:
> I'm reading EXT4 codes and has some questions about readdir
> implementation.
> 
> Why traverse the directory in hash order? This brings lots of code to
> build and traverse a red-black tree. Why not just plainly traverse the
> directory's blocks?

Because htree does not maintain constant ordering of directory entries.
Consider a readdir running at the same time as files are being added:

readdir proc		create proc
read block 0
read block 1
			create new file
			    hash of filename puts entry into block 0
			    block 0 is full, split it
				add new block 3
				copy 1/2 of block 0 entries to block 3
				add new filename to block 0
read block 2
read block 3

When the readdir returns block 3, 1/2 of the entries in that block
are the same as were returned with the original block 0 read.  This
is a violation of POSIX readdir() semantics that require each existing
entry only be returned a single time (it doesn't matter if the new
filename is returned or not).

> Since the red-black tree is built every time a NFS readdir request comes
> in, in case of hash collision, the nfs client may receive duplicate dir
> entries if the buffer is not large enough to return all entries with the
> same hash value in once.

This is because NFSv2 only has a 32-bit cookie value, and if there is a
whole buffer full of values with the same 32-bit hash there isn't anything
that can be done about it except drop those extra duplicates (a very rare
case).  It would have a much more serious problem if there was a directory
larger than 2^32 bytes in size, which would result in all entries beyond
2GB being lost.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html