ext4 and extremely slow filesystem traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello list,

  I have troubles with the daily backup of a modest filesystem which
tends to take more that 10 hours. I have ext4 all over the place on ~200
servers and never ran into such a problem.

  The filesystem capacity is 300 GB (19,6M inodes) with 196 GB (9,3M
inodes) used. It's mounted 'defaults,noatime'. It sits on a hardware
RAID array thru plain LVM slices. The RAID array is a RAID5 running on
5x SATA 500G disks, with a battery-backed (RAM) cache and write-back
cache policy. To be precise, it's an Areca 1231.

  The hardware RAID array use 64kB stripes and I've configured the
filesystem with 4kB blocks and stride=16. It also has 0 reserved blocks.
In other works the fs was created with 'mkfs -t ext4 -E stride=16 -m 0
-L volname /dev/vgX/Y'. I'm attaching the mke2fs.conf for reference too.

  Everything is running with Debian Squeeze and its 2.6.32 kernel (amd64
flavour), on a 4 cores and 4 GB RAM server.

  I ran a tiobench tonight on an idle instance (I have two identicals
systems - hw, sw, data - with exactly the same pb). I've attached
results as plain text to protect them from line wrapping. They look fine
to me.

  When I try to backup the problematic filesystem with tar, rsync or
whatever tool traversing the whole filesystem, things are awful. I know
that this filesystem has *lots* of directories, most with few or no
files in them. Tonight I ran a simple 'find /path/to/vol -type d |pv
-bl' (counts directories as they are found), I stopped it more than 2
hours later : it was not done, and already counted more than 2M
directories. IO stats showed 1000 read calls/sec with avq=1 and avio=5
ms. CPU is 2% so it is totally I/O bound. This looks like the worst
random read case to me.

  I even tried a hack which tries to sort directories while traversing
the filesystem to no avail.

  Right now I don't even know how to analyze my filesystem further.
Sorry for not being able to describe it more accurately. I'm in search
for any advice or direction to improve this situation. While keeping
using ext4 of course :).

  PS: I did ask to the developers to not abuse the filesystem that way,
and that in 2013 it's okay to have 10k+ files per directory... No
success, so I guess I'll have to work around it.

filer:/srv/painfulvol/bench# tiobench --size 10000
Run #1: /usr/bin/tiotest -t 8 -f 1250 -r 500 -b 4096 -d . -T

Unit information
================
File size = megabytes
Blk Size  = bytes
Rate      = megabytes per second
CPU%      = percentage of CPU used during the test
Latency   = milliseconds
Lat%      = percent of requests that took longer than X seconds
CPU Eff   = Rate divided by CPU% - throughput per cpu load

Sequential Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.32-5-amd64               10000  4096    1  215.82 42.21%     0.017     1384.29   0.00000  0.00000   511
2.6.32-5-amd64               10000  4096    2  129.51 48.53%     0.057     5115.46   0.00020  0.00000   267
2.6.32-5-amd64               10000  4096    4   89.80 66.26%     0.168     6697.64   0.00043  0.00000   136
2.6.32-5-amd64               10000  4096    8   77.11 113.3%     0.394     6750.12   0.00102  0.00000    68

Random Reads
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.32-5-amd64               10000  4096    1    0.79 0.302%     4.951       58.56   0.00000  0.00000   260
2.6.32-5-amd64               10000  4096    2    0.41 0.328%    17.165      174.55   0.00000  0.00000   126
2.6.32-5-amd64               10000  4096    4    0.80 1.024%    18.848      358.64   0.00000  0.00000    78
2.6.32-5-amd64               10000  4096    8    0.82 1.801%    35.989      808.74   0.00000  0.00000    45

Sequential Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.32-5-amd64               10000  4096    1  243.70 78.53%     0.014      492.80   0.00000  0.00000   310
2.6.32-5-amd64               10000  4096    2  186.89 150.9%     0.037     1969.62   0.00000  0.00000   124
2.6.32-5-amd64               10000  4096    4  113.90 209.8%     0.122     6303.26   0.00137  0.00000    54
2.6.32-5-amd64               10000  4096    8   88.32 336.6%     0.307     9451.83   0.00285  0.00000    26

Random Writes
                              File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier                    Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.6.32-5-amd64               10000  4096    1  107.11 101.4%     0.009        0.06   0.00000  0.00000   106
2.6.32-5-amd64               10000  4096    2  173.32 337.2%     0.010        0.04   0.00000  0.00000    51
2.6.32-5-amd64               10000  4096    4  224.92 921.3%     0.011        0.76   0.00000  0.00000    24
2.6.32-5-amd64               10000  4096    8  206.05 1598.%     0.012        1.00   0.00000  0.00000    13
[defaults]
	base_features = sparse_super,filetype,resize_inode,dir_index,ext_attr
	blocksize = 4096
	inode_size = 256
	inode_ratio = 16384

[fs_types]
	ext3 = {
		features = has_journal
	}
	ext4 = {
		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
		inode_size = 256
	}
	ext4dev = {
		features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
		inode_size = 256
		options = test_fs=1
	}
	small = {
		blocksize = 1024
		inode_size = 128
		inode_ratio = 4096
	}
	floppy = {
		blocksize = 1024
		inode_size = 128
		inode_ratio = 8192
	}
	news = {
		inode_ratio = 4096
	}
	largefile = {
		inode_ratio = 1048576
		blocksize = -1
	}
	largefile4 = {
		inode_ratio = 4194304
		blocksize = -1
	}
	hurd = {
	     blocksize = 4096
	     inode_size = 128
	}
_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users

[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux