Hi David, Thanks for posting those results. >From the Fio runs, I see you are getting around 200 iops at 128kb write io size. I would imagine you should be getting somewhere around 200-300 iops for the cluster you posted in the initial post, so it looks like its performing about right. 200 iops * 128kb is around 25MB/s, so that's as good as you are going to get, even in an ideal environment with a high queue depth. The iostat info shows that each ZFS receive is more or less doing single threaded writes, which is why you are getting such slow performance for a single ZFS receive and that 8 scale accordingly. It looks like each ZFS operation has to wait for the previous to finish before submitting the next. There are a couple of options, I know some are not applicable but I'm listing for completeness. 1. SSD Journals will give you a significant boost, maybe 5-6x 2. Double the number disks will probably double performance if you can get the queue depth high enough (ie more concurrent ZFS receives) These next two are specific to ZFS and I'm not sure if they are possible as I don't have much knowledge of ZFS 1. Make ZFS receive do larger IO's. If your cluster can do ~40iops per thread, then RBD bandwidth will scale with increasing IO sizes 2. Somehow get ZFS to coalesce the writes so that they are written to the RBD at higher queue depths and larger block sizes. I'm not sure if there are some ZIL parameters that can be changed to achieve this. You can see the type of effect you could achieve with these last 2 by running a FIO job with iodepth=8 and bs=1M Nick > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > J David > Sent: 24 April 2015 01:20 > To: Mark Nelson > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: Having trouble getting good performance > > On Thu, Apr 23, 2015 at 4:23 PM, Mark Nelson <mnelson@xxxxxxxxxx> > wrote: > > If you want to adjust the iodepth, you'll need to use an asynchronous > > ioengine like libaio (you also need to use direct=1) > > Ah yes, libaio makes a big difference. With 1 job: > > testfile: (g=0): rw=randwrite, bs=128K-128K/128K-128K/128K-128K, > ioengine=libaio, iodepth=64 > fio-2.1.3 > Starting 1 process > > testfile: (groupid=0, jobs=1): err= 0: pid=6290: Thu Apr 23 20:43:27 2015 > write: io=30720MB, bw=28503KB/s, iops=222, runt=1103633msec > slat (usec): min=12, max=1049.4K, avg=2427.89, stdev=13913.04 > clat (msec): min=4, max=1975, avg=284.97, stdev=268.71 > lat (msec): min=4, max=1975, avg=287.40, stdev=268.37 > clat percentiles (msec): > | 1.00th=[ 7], 5.00th=[ 11], 10.00th=[ 20], 20.00th=[ 36], > | 30.00th=[ 60], 40.00th=[ 120], 50.00th=[ 219], 60.00th=[ 318], > | 70.00th=[ 416], 80.00th=[ 519], 90.00th=[ 652], 95.00th=[ 766], > | 99.00th=[ 1090], 99.50th=[ 1221], 99.90th=[ 1516], 99.95th=[ 1598], > | 99.99th=[ 1860] > bw (KB /s): min= 236, max=170082, per=100.00%, avg=29037.74, > stdev=15788.85 > lat (msec) : 10=4.63%, 20=5.77%, 50=16.59%, 100=10.64%, 250=15.40% > lat (msec) : 500=25.38%, 750=15.89%, 1000=4.00%, 2000=1.70% > cpu : usr=0.37%, sys=1.00%, ctx=99920, majf=0, minf=27 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, > >=64=100.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, > >=64=0.0% > issued : total=r=0/w=245760/d=0, short=r=0/w=0/d=0 > > Run status group 0 (all jobs): > WRITE: io=30720MB, aggrb=28503KB/s, minb=28503KB/s, maxb=28503KB/s, > mint=1103633msec, maxt=1103633msec > > Disk stats (read/write): > vdb: ios=0/246189, merge=0/219, ticks=0/67559576, in_queue=67564864, > util=100.00% > > With 2 jobs: > > testfile: (g=0): rw=randwrite, bs=128K-128K/128K-128K/128K-128K, > ioengine=libaio, iodepth=64 > testfile: (g=0): rw=randwrite, bs=128K-128K/128K-128K/128K-128K, > ioengine=libaio, iodepth=64 > fio-2.1.3 > Starting 2 processes > > testfile: (groupid=0, jobs=2): err= 0: pid=6394: Thu Apr 23 21:24:09 2015 > write: io=46406MB, bw=26384KB/s, iops=206, runt=1801073msec > slat (usec): min=11, max=3457.7K, avg=9589.56, stdev=44841.01 > clat (msec): min=5, max=5256, avg=611.29, stdev=507.51 > lat (msec): min=5, max=5256, avg=620.88, stdev=510.21 > clat percentiles (msec): > | 1.00th=[ 25], 5.00th=[ 62], 10.00th=[ 102], 20.00th=[ 192], > | 30.00th=[ 293], 40.00th=[ 396], 50.00th=[ 502], 60.00th=[ 611], > | 70.00th=[ 742], 80.00th=[ 930], 90.00th=[ 1254], 95.00th=[ 1582], > | 99.00th=[ 2376], 99.50th=[ 2769], 99.90th=[ 3687], 99.95th=[ 4080], > | 99.99th=[ 4686] > bw (KB /s): min= 98, max=108111, per=53.88%, avg=14214.41, > stdev=10031.64 > lat (msec) : 10=0.24%, 20=0.46%, 50=2.85%, 100=6.27%, 250=16.04% > lat (msec) : 500=24.00%, 750=20.47%, 1000=12.35%, 2000=15.14%, > >=2000=2.17% > cpu : usr=0.18%, sys=0.49%, ctx=291909, majf=0, minf=55 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, > >=64=100.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, > >=64=0.0% > issued : total=r=0/w=371246/d=0, short=r=0/w=0/d=0 > > Run status group 0 (all jobs): > WRITE: io=46406MB, aggrb=26383KB/s, minb=26383KB/s, maxb=26383KB/s, > mint=1801073msec, maxt=1801073msec > > Disk stats (read/write): > vdb: ios=0/371958, merge=0/358, ticks=0/111668288, in_queue=111672480, > util=100.00% > > And here is some "iostat -xt 10" from the start of the ZFS machine doing a > snapshot receive: (vdb = the Ceph RBD) > > 04/24/2015 12:12:50 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 0.10 0.00 0.30 0.00 0.00 99.60 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 0.10 0.00 0.40 > 8.00 0.00 0.00 0.00 0.00 0.00 0.00 > vdb 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:13:00 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 0.60 0.00 1.20 9.27 0.00 88.93 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.20 1.70 2.40 6.80 > 9.68 0.01 3.37 20.00 1.41 3.37 0.64 > vdb 0.00 0.00 0.20 13.50 0.50 187.10 > 27.39 0.26 18.86 112.00 17.48 13.55 18.56 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:13:10 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 6.97 0.00 4.46 70.78 0.05 17.74 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 1.10 0.00 0.50 0.00 6.40 > 25.60 0.00 8.80 0.00 8.80 8.80 0.44 > vdb 0.00 0.00 91.00 27.90 348.00 247.45 > 10.02 1.73 14.55 10.82 26.74 8.32 98.88 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:13:20 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 3.52 0.00 4.52 72.23 0.10 19.64 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.20 0.00 0.40 0.00 2.40 > 12.00 0.00 9.00 0.00 9.00 9.00 0.36 > vdb 0.00 0.00 107.30 42.00 299.75 3150.00 > 46.21 2.18 14.57 10.93 23.88 6.68 99.68 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:13:30 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 3.32 0.00 6.10 81.31 0.10 9.17 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.20 0.00 0.40 0.00 2.40 > 12.00 0.00 9.00 0.00 9.00 9.00 0.36 > vdb 0.00 0.00 111.50 40.30 342.05 2023.25 > 31.16 2.03 13.37 9.55 23.92 6.46 98.04 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:13:40 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 3.77 0.00 4.63 77.62 0.05 13.93 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 1.30 0.00 5.20 > 8.00 0.01 5.54 0.00 5.54 5.54 0.72 > vdb 0.00 0.00 99.20 42.70 362.30 1653.00 > 28.40 2.10 14.67 11.04 23.09 7.02 99.68 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:13:50 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 0.70 0.00 1.96 93.98 0.10 3.26 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.20 0.00 0.40 0.00 2.40 > 12.00 0.01 15.00 0.00 15.00 15.00 0.60 > vdb 0.00 0.00 62.20 41.20 128.75 1604.05 > 33.52 2.05 20.03 16.49 25.38 9.67 99.96 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > From the above, avgqu-sz seems to park at 2. With 4 receives running > simultaneously, it looks like this: > > 04/24/2015 12:18:20 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 53.38 0.00 26.97 15.29 2.56 1.80 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.20 0.00 2.30 0.00 9.60 > 8.35 0.00 3.65 0.00 3.65 1.04 0.24 > vdb 0.00 0.00 244.90 117.20 1720.30 8975.55 > 59.08 10.80 29.71 12.49 65.71 2.71 98.28 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:18:30 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 16.48 0.00 11.05 44.73 0.21 27.53 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > vdb 0.00 0.00 119.10 155.40 609.55 1597.00 > 16.08 11.10 40.92 9.59 64.93 3.64 100.00 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:18:40 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 7.76 0.00 28.24 40.30 0.10 23.60 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > vdb 0.00 0.00 106.60 105.10 619.85 2152.20 > 26.19 8.51 40.04 11.31 69.17 4.15 87.96 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:18:50 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 12.34 0.00 24.95 52.41 0.05 10.25 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > vdb 0.00 0.00 171.90 144.20 834.75 14364.55 > 96.17 11.74 37.27 10.79 68.82 3.16 99.96 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:19:00 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 40.51 0.00 26.40 30.74 0.05 2.30 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > vdb 0.00 0.00 163.40 149.70 1415.25 15792.25 > 109.92 12.00 38.26 14.11 64.63 3.16 99.04 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:19:10 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 20.83 0.00 7.27 45.39 0.46 26.05 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > vdb 0.00 0.00 42.00 148.20 274.05 15151.65 > 162.21 10.39 54.74 10.75 67.21 5.25 99.88 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > 04/24/2015 12:19:20 AM > avg-cpu: %user %nice %system %iowait %steal %idle > 23.65 0.00 7.06 47.36 0.41 21.52 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > vda 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > vdb 0.00 0.00 18.00 160.40 141.85 7149.35 > 81.74 10.25 57.28 16.56 61.85 5.61 100.00 > vdc 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > > Thanks! > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com