Dear Ceph experts, We've found that a single client running rados bench can drive other users, ex. RBD users, into slow requests. Starting with a cluster that is not particularly busy, e.g. : 2014-02-15 23:14:33.714085 mon.0 xx:6789/0 725224 : [INF] pgmap v6561996: 27952 pgs: 27952 active+clean; 66303 GB data, 224 TB used, 2850 TB / 3075 TB avail; 4880KB /s rd, 28632KB/s wr, 271op/s We then start a rados bench writing many small objects: rados bench -p test 60 write -t 500 -b 1024 --no-cleanup which gives these results (note the >60s max latency!!): Total time run: 86.351424 Total writes made: 91425 Write size: 1024 Bandwidth (MB/sec): 1.034 Stddev Bandwidth: 1.26486 Max bandwidth (MB/sec): 7.14941 Min bandwidth (MB/sec): 0 Average Latency: 0.464847 Stddev Latency: 3.04961 Max latency: 66.4363 Min latency: 0.003188 30 seconds into this bench we start seeing slow requests, not only from bench writes but also some poor RBD clients, e.g.: 2014-02-15 23:16:02.820507 osd.483 xx:6804/46799 2201 : [WRN] slow request 30.195634 seconds old, received at 2014-02-15 23:15:32.624641: osd_sub_op(client.18535427.0:3922272 4.d42 4eb00d42/rbd_data.11371325138b774.0000000000006577/head//4 [] v 42083'71453 snapset=0=[]:[] snapc=0=[]) v7 currently commit sent During a longer, many-hour instance of this small write test, some of these RBD slow writes became very user visible, with disk flushes being blocked long enough (>120s) for the VM kernels to start complaining. A rados bench from a 10Gig-e client writing 4MB objects doesn't have the same long tail of latency, namely: # rados bench -p test 60 write -t 500 --no-cleanup ... Total time run: 62.811466 Total writes made: 8553 Write size: 4194304 Bandwidth (MB/sec): 544.678 Stddev Bandwidth: 173.163 Max bandwidth (MB/sec): 1000 Min bandwidth (MB/sec): 0 Average Latency: 3.50719 Stddev Latency: 0.309876 Max latency: 8.04493 Min latency: 0.166138 and there are zero slow requests, at least during this 60s duration. While the vast majority of small writes are completing with a reasonable sub-second latency, what is causing the very long tail seen by a few writes?? -- 60-120s!! Can someone advise us where to look in the perf dump, etc... to find which resource/queue is being exhausted during these tests? Oh yeah, we're running latest dumpling stable, 0.67.5, on the servers. Best Regards, Thanks in advance! Dan -- Dan van der Ster || Data & Storage Services || CERN IT Department -- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com