On 05/31/2011 07:13 PM, Boaz Harrosh wrote:
On 05/31/2011 03:35 PM, Ted Ts'o wrote:
On Tue, May 31, 2011 at 12:18:11PM +0200, Bernd Schubert wrote:
Out of interest, did anyone ever benchmark if dirindex provides any
advantages to readdir? And did those benchmarks include the
disadvantages of the present implementation (non-linear inode
numbers from readdir, so disk seeks on stat() (e.g. from 'ls -l') or
'rm -fr $dir')?
The problem is that seekdir/telldir is terminally broken (and so is
NFSv2 for using a such a tiny cookie) in that it fundamentally assumes
a linear data structure. If you're going to use any kind of
tree-based data structure, a 32-bit "offset" for seekdir/telldir just
doesn't cut it. We actually play games where we memoize the low
32-bits of the hash and keep track of which cookies we hand out via
seekdir/telldir so that things mostly work --- except for NFSv2, where
with the 32-bit cookie, you're just hosed.
The reason why we have to iterate over the directory in hash tree
order is because if we have a leaf node split, half the directories
entries get copied to another directory entry, given the promises made
by seekdir() and telldir() about directory entries appearing exactly
once during a readdir() stream, even if you hold the fd open for weeks
or days, mean that you really have to iterate over things in hash
order.
open fd means that it does not survive a server reboot. Why don't you
keep an array per open fd, and hand out the array index. In the array
you can keep a pointer to any info you want to keep. (that's the meaning of
a cookie)
An array can take lots of memory for a large directory, of course. Do we
really want to do that in kernel space? Although I wouldn't have a
problem to reserve a certain amount of memory for that. But what do we
do if that gets exhausted (for example directory too large or several
open filedescriptors)?
And how does that help with NFS and other cluster filesystems where the
client passes over the cookie? We ignore posix compliance then?
Thanks,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html