Re: Deadly slow Ceph cluster revisited

Quentin Hartman <qhartman@xxxxxxxxxxxxxxxxxxx> · Fri, 17 Jul 2015 08:47:00 -0600

What does "ceph status" say? I had a problem with similar symptoms some months ago that was accompanied by OSDs getting marked out for no apparent reason and the cluster going into a HEALTH_WARN state intermittently. Ultimately the root of the problem ended up being a faulty NIC. Once I took that out of the picture everything started flying right.
QH

On Fri, Jul 17, 2015 at 8:21 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
On 07/17/2015 08:38 AM, J David wrote:

This is the same cluster I posted about back in April.  Since then,

the situation has gotten significantly worse.

Here is what iostat looks like for the one active RBD image on this cluster:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s

avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

vdb               0.00     0.00   14.10    0.00   685.65     0.00

97.26     3.43  299.40  299.40    0.00  70.92 100.00

vdb               0.00     0.00    1.10    0.00   140.80     0.00

256.00     3.00 2753.09 2753.09    0.00 909.09 100.00

vdb               0.00     0.00   17.40    0.00  2227.20     0.00

256.00     3.00  178.78  178.78    0.00  57.47 100.00

vdb               0.00     0.00    1.30    0.00   166.40     0.00

256.00     3.00 2256.62 2256.62    0.00 769.23 100.00

vdb               0.00     0.00    8.20    0.00  1049.60     0.00

256.00     3.00  362.10  362.10    0.00 121.95 100.00

vdb               0.00     0.00    1.10    0.00   140.80     0.00

256.00     3.00 2517.45 2517.45    0.00 909.45 100.04

vdb               0.00     0.00    1.10    0.00   140.66     0.00

256.00     3.00 2863.64 2863.64    0.00 909.09  99.90

vdb               0.00     0.00    0.70    0.00    89.60     0.00

256.00     3.00 3898.86 3898.86    0.00 1428.57 100.00

vdb               0.00     0.00    0.60    0.00    76.80     0.00

256.00     3.00 5093.33 5093.33    0.00 1666.67 100.00

vdb               0.00     0.00    1.20    0.00   153.60     0.00

256.00     3.00 2568.33 2568.33    0.00 833.33 100.00

vdb               0.00     0.00    1.30    0.00   166.40     0.00

256.00     3.00 2457.85 2457.85    0.00 769.23 100.00

vdb               0.00     0.00   13.90    0.00  1779.20     0.00

256.00     3.00  220.95  220.95    0.00  71.94 100.00

vdb               0.00     0.00    1.00    0.00   128.00     0.00

256.00     3.00 2250.40 2250.40    0.00 1000.00 100.00

vdb               0.00     0.00    1.30    0.00   166.40     0.00

256.00     3.00 2798.77 2798.77    0.00 769.23 100.00

vdb               0.00     0.00    0.90    0.00   115.20     0.00

256.00     3.00 3304.00 3304.00    0.00 1111.11 100.00

vdb               0.00     0.00    0.90    0.00   115.20     0.00

256.00     3.00 3425.33 3425.33    0.00 1111.11 100.00

vdb               0.00     0.00    1.30    0.00   166.40     0.00

256.00     3.00 2290.77 2290.77    0.00 769.23 100.00

vdb               0.00     0.00    4.30    0.00   550.40     0.00

256.00     3.00  721.30  721.30    0.00 232.56 100.00

vdb               0.00     0.00    1.60    0.00   204.80     0.00

256.00     3.00 1894.75 1894.75    0.00 625.00 100.00

vdb               0.00     0.00    1.20    0.00   153.60     0.00

256.00     3.00 2375.00 2375.00    0.00 833.33 100.00

vdb               0.00     0.00    0.90    0.00   115.20     0.00

256.00     3.00 3036.44 3036.44    0.00 1111.11 100.00

vdb               0.00     0.00    1.10    0.00   140.80     0.00

256.00     3.00 3086.18 3086.18    0.00 909.09 100.00

vdb               0.00     0.00    0.90    0.00   115.20     0.00

256.00     3.00 2480.44 2480.44    0.00 1111.11 100.00

vdb               0.00     0.00    1.20    0.00   153.60     0.00

256.00     3.00 3124.33 3124.33    0.00 833.67 100.04

vdb               0.00     0.00    0.80    0.00   102.40     0.00

256.00     3.00 3228.00 3228.00    0.00 1250.00 100.00

vdb               0.00     0.00    1.20    0.00   153.60     0.00

256.00     3.00 2439.33 2439.33    0.00 833.33 100.00

vdb               0.00     0.00    1.30    0.00   166.40     0.00

256.00     3.00 2567.08 2567.08    0.00 769.23 100.00

vdb               0.00     0.00    0.80    0.00   102.40     0.00

256.00     3.00 3023.00 3023.00    0.00 1250.00 100.00

vdb               0.00     0.00    4.80    0.00   614.40     0.00

256.00     3.00  712.50  712.50    0.00 208.33 100.00

vdb               0.00     0.00    1.30    0.00   118.75     0.00

182.69     3.00 2003.69 2003.69    0.00 769.23 100.00

vdb               0.00     0.00   10.50    0.00  1344.00     0.00

256.00     3.00  344.46  344.46    0.00  95.24 100.00

So, between 0 and 15 reads per second, no write activity, a constant

queue depth of 3+, wait times in seconds, and 100% I/O utilization,

all for read performance of 100-200K/sec.  Even trivial writes can

hang for 15-60 seconds before completing.

Sometimes this behavior will "go away" for awhile and it will go back

to what we saw in April: 50IOPS (read or write) and 5-20MB/sec of I/O

throughput.  But it always comes back.

The hardware of the ceph cluster is:

- Three ceph nodes

- Two of the ceph nodes have 64GiB RAM and 12 5TB SATA drives

- One of the ceph nodes has 32GiB RAM and 4 5TB SATA drives

- All ceph nodes have Intel E5-2609 v2 (2.50Ghz quad-core) CPUs

- Everything is 10GBase-T

- All three nodes running Ceph 0.80.9

The ceph hardware is all borderline idle.  The CPU is 3-5% utilized

and iostat reports the individual disks hover around 4-7% utilized at

any given time.  It does appear to be using most of the available RAM

for OSD caching.

The client is a KVM virtual machine running on a server by itself.

Inside the virtual machine it reports 100% CPU utilization by iowait.

Outside the virtual machine host, it reports everything is idle (99.1%

idle).

Something is *definitely* wrong.  Does anyone have any idea what it might be?

Thanks for any help with this!

Hi J David,

Forgive me if you covered this in April, but have you tried rados bench from the hypervisor (or another client node)?

Something like:

rados -p <pool> 30 bench write

just to see how it handles 4MB object writes.  You can play around with the -t and -b parameters to try different object workloads.  If rados bench is also terribly slow, then you might want to start looking for evidence of IO getting hung up on a specific disk or node.

Mark

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com