Re: Bad performance of ext4 with kernel 3.0.17

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Two things I'd try:

#1) If this is a freshly created file system, the kernel may be
initializing the inode table in the background, and this could be
interfering with your benchmark workload.  To address this, you can
either (a) add the mount option noinititable, (b) add the mke2fs
option "-E lazy_itable_init=0" --- but this will cause the mke2fs to
take a lot longer, or (c) mount the file system and wait until
"dumpe2fs /dev/md3 | tail" shows that the last block group has the
ITABLE_ZEROED flag set.  For benchmarking purposes on a scratch
workload, option (a) above is the fast thing to do.

#2) It could be that the file system is choosing blocks farther away
from the beginning of the disk, which is slower, whereas the fio on
the raw disk will use the blocks closest to the beginning of the disk,
which are the fastest one.  You could try creating the file system so
it is only 10GB, and then try running fio on that small, truncated
file system, and see if that makes a difference.

     	     	     	     	   - Ted


On Thu, Mar 01, 2012 at 01:31:58PM +0800, Xupeng Yun wrote:
> I just set up a new server (Gentoo 64bit with kernel 3.0.17) with 4 x
> 15000RPM SAS disks(sdc, sdd, sde and sdf), and created soft RAID10 on
> top of them, the partitions are aligned at 1MB:
> 
>     # fdisk -lu /dev/sd{c,e,d,f}
> 
>     Disk /dev/sdc: 600.1 GB, 600127266816 bytes
>     255 heads, 63 sectors/track, 72961 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xdd96eace
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sdc1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
>     Disk /dev/sde: 600.1 GB, 600127266816 bytes
>     3 heads, 63 sectors/track, 6201712 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xf869ba1c
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sde1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
>     Disk /dev/sdd: 600.1 GB, 600127266816 bytes
>     81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xf869ba1c
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sdd1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
>     Disk /dev/sdf: 600.1 GB, 600127266816 bytes
>     81 heads, 63 sectors/track, 229693 cylinders, total 1172123568 sectors
>     Units = sectors of 1 * 512 = 512 bytes
>     Sector size (logical/physical): 512 bytes / 512 bytes
>     I/O size (minimum/optimal): 512 bytes / 512 bytes
>     Disk identifier: 0xb4893c3c
> 
>        Device Boot      Start         End      Blocks   Id  System
>     /dev/sdf1            2048  1172123567   586060760   fd  Linux raid
> autodetect
> 
> 
> and here is the RAID 10 (md3) with 64K chunk size:
> 
>     cat /proc/mdstat
>     Personalities : [raid0] [raid1] [raid10]
>     md3 : active raid10 sdf1[3] sde1[2] sdd1[1] sdc1[0]
>           1172121344 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
>     md1 : active raid1 sda1[0] sdb1[1]
>           112320 blocks [2/2] [UU]
> 
>     md2 : active raid1 sda2[0] sdb2[1]
>           41953664 blocks [2/2] [UU]
> 
>     unused devices: <none>
> 
> I did IO testing with `fio` against the raw RAID device (md3), and the
> result looks good(read IOPS 1723 / write IOPS 168):
> 
>     # fio --filename=/dev/md3 --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=p
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     ...
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     fio 2.0.3
>     Starting 16 threads
>     Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [28234K/2766K
> /s] [1723 /168  iops] [eta 00m:00s]
>     file1: (groupid=0, jobs=16): err= 0: pid=17107
>       read : io=1606.3MB, bw=27406KB/s, iops=1712 , runt= 60017msec
>         clat (usec): min=221 , max=123233 , avg=7693.00, stdev=7734.82
>          lat (usec): min=221 , max=123233 , avg=7693.12, stdev=7734.82
>         clat percentiles (usec):
>          |  1.00th=[ 1128],  5.00th=[ 1560], 10.00th=[ 1928], 20.00th=[ 2640],
>          | 30.00th=[ 3376], 40.00th=[ 4128], 50.00th=[ 4896], 60.00th=[ 6304],
>          | 70.00th=[ 8256], 80.00th=[11200], 90.00th=[16768], 95.00th=[23168],
>          | 99.00th=[38656], 99.50th=[45824], 99.90th=[62720]
>         bw (KB/s)  : min=  888, max=13093, per=7.59%, avg=2079.11, stdev=922.54
>       write: io=183840KB, bw=3063.2KB/s, iops=191 , runt= 60017msec
>         clat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
>          lat (msec): min=1 , max=153 , avg=14.70, stdev=14.59
>         clat percentiles (usec):
>          |  1.00th=[ 1816],  5.00th=[ 2544], 10.00th=[ 3248], 20.00th=[ 4512],
>          | 30.00th=[ 5728], 40.00th=[ 7648], 50.00th=[ 9536], 60.00th=[12480],
>          | 70.00th=[16320], 80.00th=[22144], 90.00th=[32640], 95.00th=[43264],
>          | 99.00th=[71168], 99.50th=[82432], 99.90th=[111104]
>         bw (KB/s)  : min=   90, max= 5806, per=33.81%, avg=1035.45, stdev=973.10
>         lat (usec) : 250=0.05%, 500=0.09%, 750=0.05%, 1000=0.19%
>         lat (msec) : 2=9.61%, 4=26.05%, 10=38.46%, 20=16.82%, 50=8.02%
>         lat (msec) : 100=0.63%, 250=0.03%
>       cpu          : usr=1.02%, sys=2.87%, ctx=1926728, majf=0, minf=288891
>       IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
>          submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          issued    : total=r=102801/w=11490/d=0, short=r=0/w=0/d=0
> 
>     Run status group 0 (all jobs):
>        READ: io=1606.3MB, aggrb=27405KB/s, minb=28063KB/s,
> maxb=28063KB/s, mint=60017msec, maxt=60017msec
>       WRITE: io=183840KB, aggrb=3063KB/s, minb=3136KB/s,
> maxb=3136KB/s, mint=60017msec, maxt=60017msec
> 
>     Disk stats (read/write):
>         md3: ios=102753/11469, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=25764/5746, aggrmerge=0/0, aggrticks=197378/51351,
> aggrin_queue=248718, aggrutil=99.31%
>       sdc: ios=26256/5723, merge=0/0, ticks=204328/68364,
> in_queue=272668, util=99.20%
>       sdd: ios=25290/5723, merge=0/0, ticks=187572/61628,
> in_queue=249188, util=98.73%
>       sde: ios=25689/5769, merge=0/0, ticks=197340/71828,
> in_queue=269172, util=99.31%
>       sdf: ios=25822/5769, merge=0/0, ticks=200272/3584,
> in_queue=203844, util=97.87%
> 
> then I created ext4 filesystem on top of the RAID device and mounted
> it to /mnt/test:
> 
>     mkfs.ext4 -E stride=16,stripe-width=32 /dev/md3
>     mount /dev/md3 /mnt/test -o noatime,nodiratime,data=writeback,nobarrier
> 
> after that I did the very same IO testing, but the result looks very
> bad(read IOPS 926 / write IOPS 97):
> 
>     # fio --filename=/mnt/test/test --direct=1 --rw=randrw --bs=16k
> --size=5G --numjobs=16 --runtime=60 --group_reporting --name=file1
> --rwmixread=90 --thread --ioengine=psync
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     ...
>     file1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
>     fio 2.0.3
>     Starting 16 threads
>     file1: Laying out IO file(s) (1 file(s) / 5120MB)
>     Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [15172K/1604K
> /s] [926 /97  iops] [eta 00m:00s]
>     file1: (groupid=0, jobs=16): err= 0: pid=18764
>       read : io=838816KB, bw=13974KB/s, iops=873 , runt= 60025msec
>         clat (usec): min=228 , max=111583 , avg=16412.46, stdev=11632.03
>          lat (usec): min=228 , max=111583 , avg=16412.60, stdev=11632.03
>         clat percentiles (usec):
>          |  1.00th=[ 1384],  5.00th=[ 2320], 10.00th=[ 3376], 20.00th=[ 5216],
>          | 30.00th=[ 8256], 40.00th=[11456], 50.00th=[14656], 60.00th=[17792],
>          | 70.00th=[21376], 80.00th=[25472], 90.00th=[32128], 95.00th=[37632],
>          | 99.00th=[50944], 99.50th=[56576], 99.90th=[70144]
>         bw (KB/s)  : min=  308, max= 4448, per=6.90%, avg=964.30, stdev=339.53
>       write: io=94208KB, bw=1569.5KB/s, iops=98 , runt= 60025msec
>         clat (msec): min=1 , max=89 , avg=16.91, stdev=10.24
>          lat (msec): min=1 , max=89 , avg=16.92, stdev=10.24
>         clat percentiles (usec):
>          |  1.00th=[ 2384],  5.00th=[ 3888], 10.00th=[ 5088], 20.00th=[ 7776],
>          | 30.00th=[10304], 40.00th=[12736], 50.00th=[15296], 60.00th=[17792],
>          | 70.00th=[20864], 80.00th=[24960], 90.00th=[30848], 95.00th=[35584],
>          | 99.00th=[47360], 99.50th=[51456], 99.90th=[62208]
>         bw (KB/s)  : min=   31, max= 4676, per=62.37%, avg=978.64, stdev=896.53
>         lat (usec) : 250=0.01%, 500=0.03%, 750=0.01%, 1000=0.06%
>         lat (msec) : 2=3.15%, 4=9.42%, 10=22.23%, 20=31.61%, 50=32.39%
>         lat (msec) : 100=1.08%, 250=0.01%
>       cpu          : usr=0.59%, sys=2.63%, ctx=1700318, majf=0, minf=19888
>       IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> 32=0.0%, >=64=0.0%
>          submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> 64=0.0%, >=64=0.0%
>          issued    : total=r=52426/w=5888/d=0, short=r=0/w=0/d=0
> 
>     Run status group 0 (all jobs):
>        READ: io=838816KB, aggrb=13974KB/s, minb=14309KB/s,
> maxb=14309KB/s, mint=60025msec, maxt=60025msec
>       WRITE: io=94208KB, aggrb=1569KB/s, minb=1607KB/s, maxb=1607KB/s,
> mint=60025msec, maxt=60025msec
> 
>     Disk stats (read/write):
>         md3: ios=58848/13987, merge=0/0, ticks=0/0, in_queue=0,
> util=0.00%, aggrios=14750/4159, aggrmerge=0/2861,
> aggrticks=112418/28260, aggrin_queue=140664, aggrutil=84.95%
>       sdc: ios=17688/4221, merge=0/2878, ticks=148664/37972,
> in_queue=186628, util=84.95%
>       sdd: ios=11801/4219, merge=0/2880, ticks=79396/29192,
> in_queue=108572, util=70.71%
>       sde: ios=16427/4099, merge=0/2843, ticks=129072/35252,
> in_queue=164304, util=81.57%
>       sdf: ios=13086/4097, merge=0/2845, ticks=92540/10624,
> in_queue=103152, util=60.02%
> 
> anything goes wrong here?
> 
> 
> --
> Xupeng Yun
> http://about.me/xupeng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux