Re: 4x lower IOPS: Linux MD vs indiv. devices - why?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tobias, 
MDRAID overhead is always there, but you can play with some tuning knobs. 
I recommend following: 
1. You must use many thread/job with quite high QD configuration. Highest IOPS for Intel P3xxx drives achieved if you saturate them with 128 *4k IO per drive. This can be done in 32 jobs and QD4 or 16J/8QD and so on. With MDRAID on top of that, you should multiply by the number of drives in the array. So, I think currently the problem, that you’re simply not submitting enough IOs. 
2. changing a HW SSD sector size to 4k may also help if you’re sure that your workload is always 4k granular
3. and finally using “imsm” MDRAID extensions and latest MDADM build. 

See some other hints there:
http://www.slidesearchengine.com/slide/hands-on-lab-how-to-unleash-your-storage-performance-by-using-nvm-express-based-pci-express-solid-state-drives
 
some config examples for NVMe are here:
https://github.com/01org/fiovisualizer/tree/master/Workloads


-- 
Andrey Kudryavtsev, 

SSD Solution Architect
Intel Corp. 
inet: 83564353
work: +1-916-356-4353
mobile: +1-916-221-2281

On 1/23/17, 8:26 AM, "fio-owner@xxxxxxxxxxxxxxx on behalf of Tobias Oberstein" <fio-owner@xxxxxxxxxxxxxxx on behalf of tobias.oberstein@xxxxxxxxx> wrote:

    Hi,
    
    I have a question rgd Linux software RAID (MD) as tested with FIO - so 
    this is slightly OT, but I am hoping for expert advice or redirection to 
    a more appropriate place (if this is unwelcome here).
    
    I have a box with this HW:
    
    - 88 cores Xeon E7 (176 HTs) + 3TB RAM
    - 8 x Intel P3608 4TB NVMe (which is logicall 16 NVMes)
    
    With random 4kB read load, I am able to max it out at 7 million IOPS - 
    but only if I run FIO on the _individual_ NVMe devices.
    
    [global]
    group_reporting
    filename=/dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1:/dev/nvme8n1:/dev/nvme9n1:/dev/nvme10n1:/dev/nvme11n1:/dev/nvme12n1:/dev/nvme13n1:/dev/nvme14n1:/dev/nvme15n1
    size=30G
    ioengine=sync
    iodepth=1
    thread=1
    direct=1
    time_based=1
    randrepeat=0
    norandommap=1
    bs=4k
    runtime=120
    
    [randread]
    stonewall
    rw=randread
    numjobs=2560
    
    When I create a stripe set over all devices:
    
    sudo mdadm --create /dev/md1 --chunk=8 --level=0 --raid-devices=16 \
        /dev/nvme0n1 \
        /dev/nvme1n1 \
        /dev/nvme2n1 \
        /dev/nvme3n1 \
        /dev/nvme4n1 \
        /dev/nvme5n1 \
        /dev/nvme6n1 \
        /dev/nvme7n1 \
        /dev/nvme8n1 \
        /dev/nvme9n1 \
        /dev/nvme10n1 \
        /dev/nvme11n1 \
        /dev/nvme12n1 \
        /dev/nvme13n1 \
        /dev/nvme14n1 \
        /dev/nvme15n1
    
    I only get 1.6 million IOPS. Detail results down below.
    
    Note: the array is created with chunk size 8K because this is for 
    database workload. Here I tested with 4k block size, but the it's 
    similar (lower perf on MD) with 8k
    
    Any helps or hints would be greatly appreciated!
    
    Cheers,
    /Tobias
    
    
    
    7 million IOPS on raw, individual NVMe devices
    ==============================================
    
    oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ sudo 
    /opt/fio/bin/fio postgresql_storage_workload.fio
    randread: (g=0): rw=randread, bs=4096B-4096B,4096B-4096B,4096B-4096B, 
    ioengine=sync, iodepth=1
    ...
    fio-2.17-17-g9cf1
    Starting 2560 threads
    Jobs: 2367 (f=29896): 
    [_(2),f(3),_(2),f(11),_(2),f(2),_(9),f(1),_(1),f(1),_(3),f(1),_(1),f(1),_(13),f(1),_(8),f(1),_(1),f(4),_(2),f(1),_(1),f(1),_(3),f(2),_(3),f(3),_(8),f(2),_(1),f(3),_(3),f(60),_(1),f(20),_(1),f(33),_(1),f(14),_(1),f(18),_(4),f(6),_(1),f(6),_(1),f(1),_(1),f(1),_(1),f(4),_(1),f(2),_(1),f(11),_(1),f(11),_(4),f(74),_(1),f(8),_(1),f(11),_(1),f(8),_(1),f(61),_(1),f(38),_(1),f(31),_(1),f(5),_(1),f(103),_(1),f(24),E(1),f(27),_(1),f(28),_(1),f(1),_(1),f(134),_(1),f(62),_(1),f(48),_(1),f(27),_(1),f(59),_(1),f(30),_(1),f(14),_(1),f(25),_(1),f(2),_(1),f(25),_(1),f(31),_(1),f(9),_(1),f(7),_(1),f(8),_(1),f(13),_(1),f(28),_(1),f(7),_(1),f(84),_(1),f(42),_(1),f(5),_(1),f(8),_(1),f(20),_(1),f(15),_(1),f(19),_(1),f(3),_(1),f(19),_(1),f(7),_(1),f(17),_(1),f(34),_(1),f(1),_(1),f(4),_(1),f(1),_(1),f(1),_(2),f(3),_(1),f(1),_(1),f(1),_(1),f(8),_(1),f(6),_(1),f(3),_(1),f(3),_(1),f(53),_(1),f(7),_(1),f(19),_(1),f(6),_(1),f(5),_(1),f(22),_(1),f(11),_(1),f(12),_(1),f(3),_(1),f(16),_(1),f(149),_(1),f(20),_(1),f(27),_(1),f(7),_(1),f(29),_(1),f(2),_(1),f(11),_(1),f(46),_(1),f(8),_(2),f(1),_(1),f(1),_(1),f(14),E(1),f(4),_(1),f(22),_(1),f(11),_(1),f(70),_(2),f(11),_(1),f(2),_(1),f(1),_(1),f(1),_(1),f(21),_(1),f(8),_(1),f(4),_(1),f(45),_(2),f(1),_(1),f(18),_(1),f(12),_(1),f(6),_(1),f(5),_(1),f(27),_(1),f(3),_(1),f(3),_(1),f(19),_(1),f(4),_(1),f(25),_(1),f(4),_(1),f(1),_(1),f(2),_(1),f(1),_(1),f(13),_(1),f(18),_(1),f(1),_(1),f(1),_(1),f(29),_(1),f(27)][100.0%][r=21.1GiB/s,w=0KiB/s][r=5751k,w=0 
    IOPS][eta 00m:00s]
    randread: (groupid=0, jobs=2560): err= 0: pid=114435: Mon Jan 23 
    15:47:17 2017
        read: IOPS=6965k, BW=26.6GiB/s (28.6GB/s)(3189GiB/120007msec)
         clat (usec): min=38, max=33262, avg=360.11, stdev=465.36
          lat (usec): min=38, max=33262, avg=360.20, stdev=465.40
         clat percentiles (usec):
          |  1.00th=[  114],  5.00th=[  135], 10.00th=[  149], 20.00th=[  171],
          | 30.00th=[  191], 40.00th=[  213], 50.00th=[  239], 60.00th=[  270],
          | 70.00th=[  314], 80.00th=[  378], 90.00th=[  556], 95.00th=[  980],
          | 99.00th=[ 2704], 99.50th=[ 3312], 99.90th=[ 4576], 99.95th=[ 5216],
          | 99.99th=[ 8096]
         lat (usec) : 50=0.01%, 100=0.11%, 250=53.75%, 500=34.23%, 750=5.23%
         lat (usec) : 1000=1.79%
         lat (msec) : 2=2.88%, 4=1.81%, 10=0.20%, 20=0.01%, 50=0.01%
       cpu          : usr=0.63%, sys=4.89%, ctx=837434400, majf=0, minf=2557
       IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
     >=64=0.0%
          submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
     >=64=0.0%
          complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
     >=64=0.0%
          issued rwt: total=835852266,0,0, short=0,0,0, dropped=0,0,0
          latency   : target=0, window=0, percentile=100.00%, depth=1
    
    Run status group 0 (all jobs):
        READ: bw=26.6GiB/s (28.6GB/s), 26.6GiB/s-26.6GiB/s 
    (28.6GB/s-28.6GB/s), io=3189GiB (3424GB), run=120007-120007msec
    
    Disk stats (read/write):
       nvme0n1: ios=52191377/0, merge=0/0, ticks=14400568/0, 
    in_queue=14802400, util=100.00%
       nvme1n1: ios=52241684/0, merge=0/0, ticks=13919744/0, 
    in_queue=15101276, util=100.00%
       nvme2n1: ios=52241537/0, merge=0/0, ticks=11146952/0, 
    in_queue=12053112, util=100.00%
       nvme3n1: ios=52241416/0, merge=0/0, ticks=10806624/0, 
    in_queue=11135004, util=100.00%
       nvme4n1: ios=52241285/0, merge=0/0, ticks=19320448/0, 
    in_queue=21079576, util=100.00%
       nvme5n1: ios=52241142/0, merge=0/0, ticks=18786968/0, 
    in_queue=19393024, util=100.00%
       nvme6n1: ios=52241000/0, merge=0/0, ticks=19610892/0, 
    in_queue=20140104, util=100.00%
       nvme7n1: ios=52240874/0, merge=0/0, ticks=20482920/0, 
    in_queue=21090048, util=100.00%
       nvme8n1: ios=52240731/0, merge=0/0, ticks=14533992/0, 
    in_queue=14929172, util=100.00%
       nvme9n1: ios=52240587/0, merge=0/0, ticks=12854956/0, 
    in_queue=13919288, util=100.00%
       nvme10n1: ios=52240447/0, merge=0/0, ticks=11085508/0, 
    in_queue=11390392, util=100.00%
       nvme11n1: ios=52240301/0, merge=0/0, ticks=18490260/0, 
    in_queue=20110288, util=100.00%
       nvme12n1: ios=52240097/0, merge=0/0, ticks=11377884/0, 
    in_queue=11683568, util=100.00%
       nvme13n1: ios=52239956/0, merge=0/0, ticks=15205304/0, 
    in_queue=16314628, util=100.00%
       nvme14n1: ios=52239766/0, merge=0/0, ticks=27003788/0, 
    in_queue=27659920, util=100.00%
       nvme15n1: ios=52239620/0, merge=0/0, ticks=17352624/0, 
    in_queue=17910636, util=100.00%
    
    
    1.6 millions IOPS on Linux MD over 16 NVMe devices
    ==================================================
    
    oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ sudo 
    /opt/fio/bin/fio postgresql_storage_workload.fio
    randread: (g=0): rw=randread, bs=4096B-4096B,4096B-4096B,4096B-4096B, 
    ioengine=sync, iodepth=1
    ...
    fio-2.17-17-g9cf1
    Starting 2560 threads
    Jobs: 2560 (f=2560): [r(2560)][100.0%][r=6212MiB/s,w=0KiB/s][r=1590k,w=0 
    IOPS][eta 00m:00s]
    randread: (groupid=0, jobs=2560): err= 0: pid=146070: Mon Jan 23 
    17:21:15 2017
        read: IOPS=1588k, BW=6204MiB/s (6505MB/s)(728GiB/120098msec)
         clat (usec): min=27, max=28498, avg=124.51, stdev=113.10
          lat (usec): min=27, max=28498, avg=124.58, stdev=113.10
         clat percentiles (usec):
          |  1.00th=[   78],  5.00th=[   84], 10.00th=[   86], 20.00th=[   89],
          | 30.00th=[   95], 40.00th=[  102], 50.00th=[  105], 60.00th=[  108],
          | 70.00th=[  118], 80.00th=[  133], 90.00th=[  173], 95.00th=[  221],
          | 99.00th=[  358], 99.50th=[  506], 99.90th=[ 2192], 99.95th=[ 2608],
          | 99.99th=[ 2960]
         lat (usec) : 50=0.06%, 100=35.14%, 250=61.83%, 500=2.46%, 750=0.19%
         lat (usec) : 1000=0.07%
         lat (msec) : 2=0.13%, 4=0.12%, 10=0.01%, 20=0.01%, 50=0.01%
       cpu          : usr=0.08%, sys=4.49%, ctx=200431993, majf=0, minf=2557
       IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
     >=64=0.0%
          submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
     >=64=0.0%
          complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
     >=64=0.0%
          issued rwt: total=190730463,0,0, short=0,0,0, dropped=0,0,0
          latency   : target=0, window=0, percentile=100.00%, depth=1
    
    Run status group 0 (all jobs):
        READ: bw=6204MiB/s (6505MB/s), 6204MiB/s-6204MiB/s 
    (6505MB/s-6505MB/s), io=728GiB (781GB), run=120098-120098msec
    
    Disk stats (read/write):
         md1: ios=190632612/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, 
    aggrios=11920653/0, aggrmerge=0/0, aggrticks=1228287/0, 
    aggrin_queue=1247601, aggrutil=100.00%
       nvme15n1: ios=11919850/0, merge=0/0, ticks=1214924/0, 
    in_queue=1225896, util=100.00%
       nvme6n1: ios=11921162/0, merge=0/0, ticks=1182716/0, 
    in_queue=1191452, util=100.00%
       nvme9n1: ios=11916313/0, merge=0/0, ticks=1265060/0, 
    in_queue=1296728, util=100.00%
       nvme11n1: ios=11922174/0, merge=0/0, ticks=1206084/0, 
    in_queue=1239808, util=100.00%
       nvme2n1: ios=11921547/0, merge=0/0, ticks=1238956/0, 
    in_queue=1272916, util=100.00%
       nvme14n1: ios=11923176/0, merge=0/0, ticks=1168688/0, 
    in_queue=1178360, util=100.00%
       nvme5n1: ios=11923142/0, merge=0/0, ticks=1192656/0, 
    in_queue=1207808, util=100.00%
       nvme8n1: ios=11921507/0, merge=0/0, ticks=1250164/0, 
    in_queue=1258956, util=100.00%
       nvme10n1: ios=11919058/0, merge=0/0, ticks=1294028/0, 
    in_queue=1304536, util=100.00%
       nvme1n1: ios=11923129/0, merge=0/0, ticks=1246892/0, 
    in_queue=1281952, util=100.00%
       nvme13n1: ios=11923354/0, merge=0/0, ticks=1241540/0, 
    in_queue=1271820, util=100.00%
       nvme4n1: ios=11926936/0, merge=0/0, ticks=1190384/0, 
    in_queue=1224192, util=100.00%
       nvme7n1: ios=11921139/0, merge=0/0, ticks=1200624/0, 
    in_queue=1214240, util=100.00%
       nvme0n1: ios=11916614/0, merge=0/0, ticks=1230916/0, 
    in_queue=1242372, util=100.00%
       nvme12n1: ios=11916963/0, merge=0/0, ticks=1266840/0, 
    in_queue=1277600, util=100.00%
       nvme3n1: ios=11914399/0, merge=0/0, ticks=1262128/0, 
    in_queue=1272988, util=100.00%
    oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$
    N�����r��y���b�X��ǧv�^�)޺{.n�+�������?��ܨ}���Ơz�&j:+v���?����zZ+��+zf���h���~����i���z�?�w���?����&�)ߢ?f

��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�

[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux