Am 02.08.2010 18:12, schrieb Eric Sandeen:
On 08/02/2010 09:52 AM, Kay Diederichs wrote:Dave, as you suggested, we reverted "ext4: Avoid group preallocation for closed files" and this indeed fixes a big part of the problem: after booting the NFS server we get NFS-Server: turn5 2.6.32.16p i686 NFS-Client: turn10 2.6.18-194.8.1.el5 x86_64 exported directory on the nfs-server: /dev/md5 /mnt/md5 ext4 rw,seclabel,noatime,barrier=1,stripe=512,data=writeback 0 0 48 seconds for preparations 28 seconds to rsync 100 frames with 597M from nfs directory 57 seconds to rsync 100 frames with 595M to nfs directory 70 seconds to untar 24353 kernel files with 323M to nfs directory 57 seconds to rsync 24353 kernel files with 323M from nfs directory 133 seconds to run xds_par in nfs directory 425 seconds to run the scriptInteresting, I had found this commit to be a problem for small files which are constantly created& deleted; the commit had the effect of packing the newly created files in the first free space that could be found, rather than walking down the disk leaving potentially fragmented freespace behind (see seekwatcher graph attached). Reverting the patch sped things up for this test, but left the filesystem freespace in bad shape. But you seem to see one of the largest effects in here: 261 seconds to rsync 100 frames with 595M to nfs directory vs 57 seconds to rsync 100 frames with 595M to nfs directory with the patch reverted making things go faster. So you are doing 100 6MB writes to the server, correct?
correct. >
Is the filesystem mkfs'd fresh before each test, or is it aged?
it is too big to "just create it freshly". It was actually created a week ago, and filled by a single ~ 10-hour rsync job run on the server such that the filesystem should be filled in the most linear way possible. Since then, the benchmarking has created and deleted lots of files.
If not mkfs'd, is it at least completely empty prior to the test, or does data remain on it? I'm just
it's not empty: df -h reports Filesystem Size Used Avail Use% Mounted on /dev/md5 3.7T 2.8T 712G 80% /mnt/md5 e2freefrag-1.41.12 reports: Device: /dev/md5 Blocksize: 4096 bytes Total blocks: 976761344 Free blocks: 235345984 (24.1%) Min. free extent: 4 KB Max. free extent: 99348 KB Avg. free extent: 1628 KB HISTOGRAM OF FREE EXTENT SIZES: Extent Size Range : Free extents Free Blocks Percent 4K... 8K- : 1858 1858 0.00% 8K... 16K- : 3415 8534 0.00% 16K... 32K- : 9952 54324 0.02% 32K... 64K- : 23884 288848 0.12% 64K... 128K- : 27901 658130 0.28% 128K... 256K- : 25761 1211519 0.51% 256K... 512K- : 35863 3376274 1.43% 512K... 1024K- : 48643 9416851 4.00% 1M... 2M- : 150311 60704033 25.79% 2M... 4M- : 244895 148283666 63.01% 4M... 8M- : 3970 5508499 2.34% 8M... 16M- : 187 551835 0.23% 16M... 32M- : 302 1765912 0.75% 32M... 64M- : 282 2727162 1.16% 64M... 128M- : 42 788539 0.34%
wondering if fragmented freespace is contributing to this behavior as well. If there is fragmented freespace, then with the patch I think the allocator is more likely to hunt around for small discontiguous chunks of free sapce, rather than going further out in the disk looking for a large area to allocate from.
the last step of the benchmark, "xds_par", reads 600MB and writes 50MB. It has 16 threads which might put some additional pressure on the freespace hunting. That step also is fast in 2.6.27.48 but slow in 2.6.32+ .
It might be interesting to use seekwatcher on the server to visualize the allocation/IO patterns for the test running just this far? -Eric
will try to install seekwatcher. thanks, Kay
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature