Hi, Am Dienstag, den 18.04.2017, 18:34 +0000 schrieb Peter Maloney: > The 'slower with every snapshot even after CoW totally flattens it' > issue I just find easy to test, and I didn't test it on hammer or > earlier, and others confirmed it, but didn't keep track of the > versions. Just make an rbd image, map it (probably... but my tests > were with qemu librbd), do fio randwrite tests with sync and direct > on the device (no need for a fs, or anything), and then make a few > snaps and watch it go way slower. > > How about we make this thread a collection of versions then. And I'll > redo my test on Thursday maybe. I did some tests now and provide the results and observations here: This is the fio config file I used: <rbd.fio> [global] ioengine=rbd clientname=admin pool=images rbdname=benchmark invalidate=0 # mandatory rw=randwrite bs=4k [rbd_iodepth32] iodepth=32 </rbd.fio> Results from fio on image 'benchmark' without any snapshots: rbd_iodepth32: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 fio-2.16 Starting 1 process rbd engine: RBD version: 0.1.10 Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/3620KB/0KB /s] [0/905/0 iops] [eta 00m:00s] rbd_iodepth32: (groupid=0, jobs=1): err= 0: pid=14192: Thu Apr 20 13:11:27 2017 write: io=8192.0MB, bw=1596.2KB/s, iops=399, runt=5252799msec slat (usec): min=1, max=6708, avg=173.27, stdev=97.65 clat (msec): min=9, max=14505, avg=79.97, stdev=456.86 lat (msec): min=9, max=14505, avg=80.15, stdev=456.86 clat percentiles (msec): | 1.00th=[ 26], 5.00th=[ 28], 10.00th=[ 28], 20.00th=[ 30], | 30.00th=[ 31], 40.00th=[ 32], 50.00th=[ 33], 60.00th=[ 35], | 70.00th=[ 37], 80.00th=[ 39], 90.00th=[ 43], 95.00th=[ 47], | 99.00th=[ 1516], 99.50th=[ 3621], 99.90th=[ 7046], 99.95th=[ 8094], | 99.99th=[10159] lat (msec) : 10=0.01%, 20=0.29%, 50=96.17%, 100=1.49%, 250=0.31% lat (msec) : 500=0.21%, 750=0.15%, 1000=0.14%, 2000=0.38%, >=2000=0.85% cpu : usr=31.95%, sys=58.32%, ctx=5392823, majf=0, minf=0 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued : total=r=0/w=2097152/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: io=8192.0MB, aggrb=1596KB/s, minb=1596KB/s, maxb=1596KB/s, mint=5252799msec, maxt=5252799msec Disk stats (read/write): vdb: ios=6/20, merge=0/29, ticks=76/12168, in_queue=12244, util=0.23% sudo fio rbd.fio 2023.87s user 3216.33s system 99% cpu 1:27:31.92 total Now I created three snapshots of image 'benchmark'. Cluster became iresponsive (slow requests stared to appear), a new run of fio never got passed 0.0%. Removed all three snapshots. Cluster became responsive again, fio started to work like before (left it running during snapshot removal). Created one snapshot of 'benchmark' while fio was running. Cluster became iresponsive after few minutes, fio got nothing done as soon as the snapshot was made. Stopped here ;) Regards, -- J.Hofmüller mur.sat -- a space art project http://sat.mur.at/
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com