Re: slow requests and short OSD failures in small cluster

Nick Fisk <nick@xxxxxxxxxx> · Thu, 20 Apr 2017 14:45:55 +0100

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Jogi Hofmüller
> Sent: 20 April 2017 13:51
> To: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  slow requests and short OSD failures in small cluster
> 
> Hi,
> 
> Am Dienstag, den 18.04.2017, 18:34 +0000 schrieb Peter Maloney:
> 
> > The 'slower with every snapshot even after CoW totally flattens it'
> > issue I just find easy to test, and I didn't test it on hammer or
> > earlier, and others confirmed it, but didn't keep track of the
> > versions. Just make an rbd image, map it (probably... but my tests
> > were with qemu librbd), do fio randwrite tests with sync and direct on
> > the device (no need for a fs, or anything), and then make a few snaps
> > and watch it go way slower.
> >
> > How about we make this thread a collection of versions then. And I'll
> > redo my test on Thursday maybe.
> 
> I did some tests now and provide the results and observations here:
> 
> This is the fio config file I used:
> 
> <rbd.fio>
> [global]
> ioengine=rbd
> clientname=admin
> pool=images
> rbdname=benchmark
> invalidate=0    # mandatory
> rw=randwrite
> bs=4k
> 
> [rbd_iodepth32]
> iodepth=32
> </rbd.fio>
> 
> Results from fio on image 'benchmark' without any snapshots:
> 
> rbd_iodepth32: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
> iodepth=32
> fio-2.16
> Starting 1 process
> rbd engine: RBD version: 0.1.10
> Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/3620KB/0KB /s] [0/905/0 iops] [eta 00m:00s]
> rbd_iodepth32: (groupid=0, jobs=1): err= 0: pid=14192: Thu Apr 20
> 13:11:27 2017
>   write: io=8192.0MB, bw=1596.2KB/s, iops=399, runt=5252799msec
>     slat (usec): min=1, max=6708, avg=173.27, stdev=97.65
>     clat (msec): min=9, max=14505, avg=79.97, stdev=456.86
>      lat (msec): min=9, max=14505, avg=80.15, stdev=456.86
>     clat percentiles (msec):
>      |  1.00th=[   26],  5.00th=[   28], 10.00th=[   28], 20.00th=[   30],
>      | 30.00th=[   31], 40.00th=[   32], 50.00th=[   33], 60.00th=[   35],
>      | 70.00th=[   37], 80.00th=[   39], 90.00th=[   43], 95.00th=[   47],
>      | 99.00th=[ 1516], 99.50th=[ 3621], 99.90th=[ 7046], 99.95th=[ 8094],
>      | 99.99th=[10159]
>     lat (msec) : 10=0.01%, 20=0.29%, 50=96.17%, 100=1.49%, 250=0.31%
>     lat (msec) : 500=0.21%, 750=0.15%, 1000=0.14%, 2000=0.38%,
> >=2000=0.85%
>   cpu          : usr=31.95%, sys=58.32%, ctx=5392823, majf=0, minf=0
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
> >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
> >=64=0.0%
>      issued    : total=r=0/w=2097152/d=0, short=r=0/w=0/d=0,
> drop=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=32
> 
> Run status group 0 (all jobs):
>   WRITE: io=8192.0MB, aggrb=1596KB/s, minb=1596KB/s, maxb=1596KB/s, mint=5252799msec, maxt=5252799msec
> 
> Disk stats (read/write):
>   vdb: ios=6/20, merge=0/29, ticks=76/12168, in_queue=12244, util=0.23% sudo fio rbd.fio  2023.87s user 3216.33s system 99% cpu
> 1:27:31.92 total
> 
> Now I created three snapshots of image 'benchmark'. Cluster became iresponsive (slow requests stared to appear), a new run of fio
> never got passed 0.0%.
> 
> Removed all three snapshots. Cluster became responsive again, fio started to work like before (left it running during snapshot
> removal).
> 
> Created one snapshot of 'benchmark' while fio was running. Cluster became iresponsive after few minutes, fio got nothing done as
> soon as the snapshot was made.
> 
> Stopped here ;)

You are generating a write amplification of a 2000x, every 4kb write IO will generate a 4MB read and 4MB write. If your cluster can't handle that IO then you will see extremely poor performance. Is your real life workload actually doing random 4kb writes at qd=32? If it is you will either want to use RBD's made up of smaller objects to try and lessen the overheads, or probably forget about using snapshots, unless there is some sort of sparse bitmap based COW feature on the horizon???

> 
> Regards,
> --
> J.Hofmüller
> 
>                mur.sat -- a space art project
>                http://sat.mur.at/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com