Re: Deadly slow Ceph cluster revisited

Mark Nelson <mnelson@xxxxxxxxxx> · Fri, 17 Jul 2015 11:19:48 -0500

On 07/17/2015 09:55 AM, J David wrote:
On Fri, Jul 17, 2015 at 10:21 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
rados -p <pool> 30 bench write

just to see how it handles 4MB object writes.

Here's that, from the VM host:

  Total time run:         52.062639
Total writes made:      66
Write size:             4194304
Bandwidth (MB/sec):     5.071

Yep, awfully slow!

Stddev Bandwidth:       11.6312
Max bandwidth (MB/sec): 80
Min bandwidth (MB/sec): 0
Average Latency:        12.436

12 second average latency!  Yikes.  That does either sound like network 
or one of the disks is very slow.  Do you see faster performance during 
the first second or two of the rados bench run?  That might indicate 
that you are backing up on a specific OSD.

Stddev Latency:         13.6272
Max latency:            51.6924
Min latency:            0.073353

Unfortunately I don't know much about how to parse this (other than
5MB/sec writes does match up with our best-case performance in the VM
guest).

If rados bench is
also terribly slow, then you might want to start looking for evidence of IO
getting hung up on a specific disk or node.

Thusfar, no evidence of that has presented itself.  iostat looks good
on every drive and the nodes are all equally loaded.

Ok.  Maybe try some iperf tests between the different OSD nodes in your 
cluster and also the client to the OSDs.

Thanks!

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com