slow requests from rados bench with small writes

Dan van der Ster <daniel.vanderster@xxxxxxx> · Sat, 15 Feb 2014 23:48:27 +0100

Dear Ceph experts,

We've found that a single client running rados bench can drive other
users, ex. RBD users, into slow requests.

Starting with a cluster that is not particularly busy, e.g. :

2014-02-15 23:14:33.714085 mon.0 xx:6789/0 725224 : [INF] pgmap
v6561996: 27952 pgs: 27952 active+clean; 66303 GB data, 224 TB used,
2850 TB / 3075 TB avail; 4880KB
/s rd, 28632KB/s wr, 271op/s

We then start a rados bench writing many small objects:
   rados bench -p test 60 write -t 500 -b 1024 --no-cleanup

which gives these results (note the >60s max latency!!):

Total time run:         86.351424
Total writes made:      91425
Write size:             1024
Bandwidth (MB/sec):     1.034
Stddev Bandwidth:       1.26486
Max bandwidth (MB/sec): 7.14941
Min bandwidth (MB/sec): 0
Average Latency:        0.464847
Stddev Latency:         3.04961
Max latency:            66.4363
Min latency:            0.003188

30 seconds into this bench we start seeing slow requests, not only
from bench writes but also some poor RBD clients, e.g.:

2014-02-15 23:16:02.820507 osd.483 xx:6804/46799 2201 : [WRN] slow
request 30.195634 seconds old, received at 2014-02-15 23:15:32.624641:
osd_sub_op(client.18535427.0:3922272 4.d42
4eb00d42/rbd_data.11371325138b774.0000000000006577/head//4 [] v
42083'71453 snapset=0=[]:[] snapc=0=[]) v7 currently commit sent

During a longer, many-hour instance of this small write test, some of
these RBD slow writes became very user visible, with disk flushes
being blocked long enough (>120s) for the VM kernels to start
complaining.

A rados bench from a 10Gig-e client writing 4MB objects doesn't have
the same long tail of latency, namely:

# rados bench -p test 60 write -t 500 --no-cleanup
...
Total time run:         62.811466
Total writes made:      8553
Write size:             4194304
Bandwidth (MB/sec):     544.678

Stddev Bandwidth:       173.163
Max bandwidth (MB/sec): 1000
Min bandwidth (MB/sec): 0
Average Latency:        3.50719
Stddev Latency:         0.309876
Max latency:            8.04493
Min latency:            0.166138

and there are zero slow requests, at least during this 60s duration.

While the vast majority of small writes are completing with a
reasonable sub-second latency, what is causing the very long tail seen
by a few writes?? -- 60-120s!! Can someone advise us where to look in
the perf dump, etc... to find which resource/queue is being exhausted
during these tests?

Oh yeah, we're running latest dumpling stable, 0.67.5, on the servers.

Best Regards, Thanks in advance!
Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com