Okay, well, I've run your fio config... but with so many results and abbrevations I currently feel a bit overchallenged :) So, please dont take badly, if I just paste the whole results. I have tried with a stripe cache size of 1024, 2048 and 4096. Btw, I also have /dev/md1, this is a 15GB unencrypted array using the same 5 disks like my LUKS-md2 array, so nearly the same just without this LUKS layer. If helpful, I can execute some fio tests on this filesystem, too. So long :) Kevin $ echo 1024 > /sys/block/md2/md/stripe_cache_size > Jobs: 1 (f=1): [____________W___] [99.7% done] [0K/99.24M /s] [0 /193 iops] [eta 00m:06s] > read: (groupid=0, jobs=8): err= 0: pid=12987 > read : io=81920MB, bw=189835KB/s, iops=370 , runt=441890msec > slat (usec): min=32 , max=4561 , avg=76.28, stdev=28.66 > clat (msec): min=5 , max=1115 , avg=334.19, stdev=151.18 > lat (msec): min=5 , max=1115 , avg=334.26, stdev=151.18 > bw (KB/s) : min= 0, max=261120, per=12.79%, avg=24288.95, stdev=11586.29 > cpu : usr=0.05%, sys=0.50%, ctx=157180, majf=0, minf=16982 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > issued r/w/d: total=163840/0/0, short=0/0/0 > > lat (msec): 10=0.52%, 20=0.71%, 50=3.01%, 100=6.17%, 250=14.59% > lat (msec): 500=68.81%, 750=4.87%, 1000=1.08%, 2000=0.25% > write: (groupid=1, jobs=8): err= 0: pid=13202 > write: io=81920MB, bw=58504KB/s, iops=114 , runt=1433851msec > slat (usec): min=45 , max=1729 , avg=212.20, stdev=56.68 > clat (msec): min=14 , max=11691 , avg=1101.17, stdev=1116.82 > lat (msec): min=14 , max=11691 , avg=1101.39, stdev=1116.82 > bw (KB/s) : min= 0, max=106666, per=14.35%, avg=8395.94, stdev=6752.35 > cpu : usr=0.28%, sys=0.10%, ctx=117451, majf=0, minf=3410 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > issued r/w/d: total=0/163840/0, short=0/0/0 > > lat (msec): 20=0.01%, 50=0.17%, 100=2.03%, 250=17.02%, 500=22.20% > lat (msec): 750=12.22%, 1000=8.66%, 2000=20.04%, >=2000=17.64% > > Run status group 0 (all jobs): > READ: io=81920MB, aggrb=189834KB/s, minb=194390KB/s, maxb=194390KB/s, mint=441890msec, maxt=441890msec > > Run status group 1 (all jobs): > WRITE: io=81920MB, aggrb=58504KB/s, minb=59908KB/s, maxb=59908KB/s, mint=1433851msec, maxt=1433851msec > > Disk stats (read/write): > dm-0: ios=327681/327756, merge=0/0, ticks=78591352/353235376, in_queue=431834680, util=100.00%, aggrios=327681/327922, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% > md2: ios=327681/327922, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=171660/222845, aggrmerge=4561629/9494657, aggrticks=16341417/4008187, aggrin_queue=20351472, aggrutil=85.78% > sdc: ios=181512/222455, merge=4583055/9505634, ticks=15650600/5944080, in_queue=21596560, util=85.78% > sdd: ios=180545/224362, merge=4526197/9587956, ticks=14356708/5542120, in_queue=19900820, util=85.42% > sde: ios=179853/224317, merge=4519718/9540999, ticks=13375156/5676828, in_queue=19053876, util=83.90% > sdf: ios=157605/222569, merge=4551205/9459549, ticks=18828608/1234632, in_queue=20065204, util=75.36% > sdg: ios=158787/220525, merge=4627970/9379150, ticks=19496016/1643276, in_queue=21140904, util=77.26% $ echo 2048 > /sys/block/md2/md/stripe_cache_size > Jobs: 1 (f=1): [_________W______] [99.6% done] [0K/92182K /s] [0 /175 iops] [eta 00m:06s] > read: (groupid=0, jobs=8): err= 0: pid=6392 > read : io=81920MB, bw=185893KB/s, iops=363 , runt=451259msec > slat (usec): min=32 , max=524 , avg=75.08, stdev=26.00 > clat (msec): min=8 , max=1849 , avg=335.65, stdev=149.76 > lat (msec): min=8 , max=1849 , avg=335.72, stdev=149.76 > bw (KB/s) : min= 0, max=105860, per=13.08%, avg=24308.99, stdev=7467.64 > cpu : usr=0.05%, sys=0.49%, ctx=157968, majf=0, minf=17171 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > issued r/w/d: total=163840/0/0, short=0/0/0 > > lat (msec): 10=0.01%, 20=0.06%, 50=2.32%, 100=7.44%, 250=15.36% > lat (msec): 500=69.09%, 750=4.50%, 1000=0.87%, 2000=0.35% > write: (groupid=1, jobs=8): err= 0: pid=6663 > write: io=81920MB, bw=75377KB/s, iops=147 , runt=1112887msec > slat (usec): min=46 , max=6453.8K, avg=988.13, stdev=46664.22 > clat (msec): min=10 , max=8631 , avg=854.18, stdev=683.37 > lat (msec): min=10 , max=8631 , avg=855.17, stdev=684.82 > bw (KB/s) : min= 0, max=100352, per=14.04%, avg=10581.97, stdev=6871.80 > cpu : usr=0.34%, sys=0.11%, ctx=92502, majf=0, minf=1531 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > issued r/w/d: total=0/163840/0, short=0/0/0 > > lat (msec): 20=0.01%, 50=0.09%, 100=1.14%, 250=11.19%, 500=24.64% > lat (msec): 750=18.78%, 1000=13.40%, 2000=24.53%, >=2000=6.23% > > Run status group 0 (all jobs): > READ: io=81920MB, aggrb=185893KB/s, minb=190354KB/s, maxb=190354KB/s, mint=451259msec, maxt=451259msec > > Run status group 1 (all jobs): > WRITE: io=81920MB, aggrb=75376KB/s, minb=77186KB/s, maxb=77186KB/s, mint=1112887msec, maxt=1112887msec > > Disk stats (read/write): > dm-0: ios=327701/328169, merge=0/0, ticks=79402348/261890568, in_queue=341301888, util=100.00%, aggrios=327701/328481, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% > md2: ios=327701/328481, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=127112/146994, aggrmerge=3364061/7997232, aggrticks=18091347/5817941, aggrin_queue=23914356, aggrutil=94.20% > sdc: ios=135729/149357, merge=3341205/8134447, ticks=18312932/8398716, in_queue=26721324, util=92.77% > sdd: ios=135561/151584, merge=3312121/8238249, ticks=16877204/8190968, in_queue=25077332, util=92.44% > sde: ios=135741/146023, merge=3345948/7981968, ticks=17659792/8659780, in_queue=26322124, util=94.20% > sdf: ios=114396/143768, merge=3413295/7801050, ticks=18551976/1652924, in_queue=20207384, util=72.89% > sdg: ios=114134/144241, merge=3407738/7830447, ticks=19054832/2187320, in_queue=21243620, util=74.96% $ echo 4096 > /sys/block/md2/md/stripe_cache_size > Jobs: 1 (f=1): [________W_______] [100.0% done] [0K/95848K /s] [0 /182 iops] [eta 00m:00s] > read: (groupid=0, jobs=8): err= 0: pid=11787 > read : io=81920MB, bw=189274KB/s, iops=369 , runt=443200msec > slat (usec): min=31 , max=4511 , avg=75.47, stdev=29.74 > clat (msec): min=5 , max=1338 , avg=336.39, stdev=155.14 > lat (msec): min=5 , max=1338 , avg=336.47, stdev=155.14 > bw (KB/s) : min= 0, max=253455, per=12.77%, avg=24162.01, stdev=11368.71 > cpu : usr=0.05%, sys=0.49%, ctx=157193, majf=0, minf=17313 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > issued r/w/d: total=163840/0/0, short=0/0/0 > > lat (msec): 10=0.47%, 20=0.66%, 50=2.95%, 100=6.33%, 250=14.56% > lat (msec): 500=68.12%, 750=5.42%, 1000=1.15%, 2000=0.33% > write: (groupid=1, jobs=8): err= 0: pid=12060 > write: io=81920MB, bw=64993KB/s, iops=126 , runt=1290687msec > slat (usec): min=61 , max=16991 , avg=197.22, stdev=110.87 > clat (msec): min=14 , max=2820 , avg=980.92, stdev=366.56 > lat (msec): min=14 , max=2821 , avg=981.12, stdev=366.56 > bw (KB/s) : min= 0, max=103770, per=13.11%, avg=8517.92, stdev=3794.28 > cpu : usr=0.28%, sys=0.08%, ctx=84352, majf=0, minf=723 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% > issued r/w/d: total=0/163840/0, short=0/0/0 > > lat (msec): 20=0.01%, 50=0.03%, 100=0.28%, 250=0.22%, 500=5.37% > lat (msec): 750=22.02%, 1000=31.66%, 2000=39.27%, >=2000=1.16% > > Run status group 0 (all jobs): > READ: io=81920MB, aggrb=189273KB/s, minb=193816KB/s, maxb=193816KB/s, mint=443200msec, maxt=443200msec > > Run status group 1 (all jobs): > WRITE: io=81920MB, aggrb=64993KB/s, minb=66553KB/s, maxb=66553KB/s, mint=1290687msec, maxt=1290687msec > > Disk stats (read/write): > dm-0: ios=327681/327629, merge=0/0, ticks=78990724/301988444, in_queue=380991692, util=100.00%, aggrios=327681/327709, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% > md2: ios=327681/327709, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=120158/119573, aggrmerge=2858405/7663126, aggrticks=17291831/9119008, aggrin_queue=26414023, aggrutil=99.60% > sdc: ios=135579/119976, merge=2813832/7324879, ticks=13974928/2192484, in_queue=16167996, util=66.57% > sdd: ios=136115/127048, merge=2826584/7736191, ticks=12932248/2477796, in_queue=15410924, util=68.08% > sde: ios=136007/130908, merge=2844473/7936354, ticks=12642232/3141268, in_queue=15784336, util=71.86% > sdf: ios=78473/94458, merge=2882361/7865984, ticks=29053772/37421808, in_queue=66488856, util=99.60% > sdg: ios=114620/125479, merge=2924777/7452224, ticks=17855976/361684, in_queue=18218004, util=54.84% Am 20.12.2013 13:36, schrieb Stan Hoeppner: > On 12/20/2013 4:26 AM, Kevin Richter wrote: >> Thanks a lot for your huge replies! > > You're welcome. > >>> Oh, that's quite old. I'd suggest upgrading to a much more recent >>> kernel as we've fixed lots of issues in this area since then. >> >> First I have switched to the newer kernel from Ubuntu Saucy: >> $ uname -a >> Linux 3.11.0-14-generic #21~precise1-Ubuntu SMP >> >> Thus, it seems that the default scheduler has been changed to deadline. >> I did not change anything. After a reboot the schedulers of all disks >> are now deadline. > > Good move Ubuntu. > >>> Model # of the CPUs so I can look up the specs? >> Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz > > Strong CPUs. > >> I have prepared a folder with 60GB for the tests. This is nearly twice >> of the available memory, so the process should be forced to actually >> write the stuff to the disk - and not only hold in the memory. >> >>> $ echo 256 > /sys/block/md2/md/stripe_cache_size >>> $ time cp -a /olddisk/testfolder /6tb/foo1/ >>> real 25m38.925s >>> user 0m0.595s >>> sys 1m23.182s >>> >>> $ echo 1024 > /sys/block/md2/md/stripe_cache_size >>> $ time cp -a /olddisk/testfolder /raid/foo2/ >>> real 7m32.824s >>> user 0m0.438s >>> sys 1m6.759s >>> >>> $ echo 2048 > /sys/block/md2/md/stripe_cache_size >>> $ time cp -a /olddisk/testfolder /raid/foo3/ >>> real 5m32.847s >>> user 0m0.418s >>> sys 1m5.671s >>> >>> $ echo 4096 > /sys/block/md2/md/stripe_cache_size >>> $ time cp -a /olddisk/testfolder /raid/foo4/ >>> real 5m54.554s >>> user 0m0.437s >>> sys 1m6.268s >> >> The difference is really amazing! So 2048 seems to be the best choice. >> 60GB in 5,5minutes are 180MB/sek. That sounds a bit high, doesnt it? >> The RAID only consist of 5 SATA disks with 7200rpm. > > A lot of the source data is being cached between runs so these numbers > aren't accurate. The throughput of this copy operation will be limited > by the speed of the single source disk, not the array. To make the > elapsed times of this copy test accurate you need to execute something > like these commands after each run: > > # sync > # echo 3 > /proc/sys/vm/drop_caches > > But this copy test will not inform you about the potential peak > performance of your array. That's why I suggested you test with FIO, > the flexible IO tester. > > # aptitude install fio > # man fio > > Sample job file suitable for your system: > > [global] > directory=/your/XFS/test/directory > zero_buffers > numjobs=8 > group_reporting > blocksize=512k > ioengine=libaio > iodepth=16 > direct=1 > size=10g > > [read] > rw=read > stonewall > > [write] > rw=write > stonewall > > This should give you a relatively accurate picture of the actual > potential throughput of your array and filesystem. > >> 'top' while copying with stripe size of 2048 (the source disk is ntfs): >>> top - 10:48:24 up 1 day, 1:41, 2 users, load average: 5.66, 3.53, 2.17 >>> Tasks: 210 total, 2 running, 208 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 0.1%us, 35.8%sy, 0.0%ni, 46.0%id, 17.9%wa, 0.0%hi, 0.2%si, 0.0%st >>> Mem: 32913992k total, 32709208k used, 204784k free, 10770344k buffers >>> Swap: 7812496k total, 0k used, 7812496k free, 20866844k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 19524 root 20 0 0 0 0 R 93 0.0 4:00.12 kworker/3:1 >>> 23744 root 20 0 0 0 0 S 55 0.0 0:50.84 kworker/0:1 >>> 23738 root 20 0 0 0 0 S 29 0.0 0:56.94 kworker/4:0 >>> 3893 root 20 0 0 0 0 S 28 0.0 36:47.50 md2_raid6 >>> 4551 root 20 0 22060 3328 720 D 25 0.0 20:21.61 mount.ntfs >>> 23273 root 20 0 0 0 0 S 22 0.0 1:54.86 kworker/7:2 >>> 23734 root 20 0 21752 1280 1040 D 21 0.0 0:49.84 cp >>> 84 root 20 0 0 0 0 S 7 0.0 8:19.34 kswapd1 >>> 83 root 20 0 0 0 0 S 6 0.0 11:55.81 kswapd0 >>> 23745 root 20 0 0 0 0 S 2 0.0 0:33.60 kworker/1:2 >>> 21598 root 20 0 0 0 0 D 1 0.0 0:11.33 kworker/u17:1 > > Hmm, what's kworker/3:1? That's not a crypto thread eating 93% of a > SandyBridge core at only ~180 MB/s throughput is it? > >> And the best thing at all: >> While all of these tests there are no warnings/panics in the syslog. >> >> With best regards, >> Kevin > > Even though XFS wasn't the cause of the problem I'm glad we were able to > help you fix it nonetheless. I'm really curious to see what kind of > throughput you can achieve with FIO, and whether crypto is a bottleneck > at the 250-350 MB/s your array should be capable of. It would be great > if you would play around a bit with FIO and post some numbers. > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs