Re: Ext3/ext4 in a clustered environement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Steven Whitehouse wrote:

We see appreciable knee points in GFS directory performance at 512, 4096 and 16384 files/directory, with progressively worse performance deterioration between each knee pair. (It's a 2^n type problem)

That is a bit strange. The GFS2 directory entries are sized according to
(length of file name + length of fixed size info) which means that
generally the number of blocks required to store a specific number of
files is not constant unless the file names are all the same length.

Generally they are, as are file sizes.

Also, once a directory has been unstuffed, the hash table will grow
until it is 128k in size, which is 16k pointers. So with 16384 directory
entries, you should be a long way from having a full hash table, since
each leaf block should contain around 80 entries (again depending on
filename length), so thats not too far off 1m entries.

Should be, but performance becomes unusable long before that happens.

So for all unstuffed directories with fewer than about 1m entries, I'd
expect to see all accesses resulting in the following I/O pattern:
 1. Look up hash table block
 2. Look up dir leaf block
 3. Look up inode (if this is a ->lookup rather than readdir)

What test are you using to generate the performance figures in this
case?

"ls -l" - which is what the clients are using as they import data for number crunching work. Rsync uses a raw directory read but the stat() calls on individual files are pretty similar.

Once the information is cached, accessing the directory is fast until the cache expires (3-10 minutes)

There is definite and very measurable performance degradation as more files are added to a directory - even on things as simple as an incremental backup the number of files opened/second falls away rapidly as directories get larger.




--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster


[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux