Re: 1256 OSD/21 server ceph cluster performance issues.

Sean Sullivan <seapasulli@xxxxxxxxxxxx> · Tue, 23 Dec 2014 18:48:12 -0600

I am trying to understand these drive throttle markers that were
mentioned to get an idea of why these drives are marked as slow.::

here is the iostat of the drive /dev/sdbm
http://paste.ubuntu.com/9607168/

an IO wait of .79 doesn't seem bad but a write wait of 21.52 seems
really high.  Looking at the ops in flight::
http://paste.ubuntu.com/9607253/

If we check against all of the osds on this node, this seems strange::
http://paste.ubuntu.com/9607331/

I do not understand why this node has ops in flight while the the
remainder seem to be performing without issue. The load on the node is
pretty light as well with an average CPU at 16 and an average iowait of
.79::

-----------------------------------------------------------------------
/var/run/ceph# iostat -xm /dev/sdbm
Linux 3.13.0-40-generic (kh10-4)     12/23/2014     _x86_64_    (40 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.94    0.00   23.30    0.79    0.00   71.97

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdbm              0.09     0.25    5.03    3.42     0.55     0.63  
288.02     0.09   10.56    2.55   22.32   2.54   2.15
-----------------------------------------------------------------------

I am still trying to understand the osd throttle perfdump so if anyone
can help shed some light on this that would be rad. From what I can tell
from the perfdump 4 osds (the last one, 228, being the slow one
currently). I ended up pulling .228 from the cluster and I have yet to
see another slow/blocked osd in the output of ceph -s. It is still
rebuilding as I just pulled .228 out but I am still getting at least
200MB/s via bonnie while the rebuild is occurring.

Finally, if this helps anyone. Although one 1x1Gb takes around 2.0 - 2.5
minutes. If we split a 10 file into 100 x 100MB we get a completion time
of about 1 minute. Which would be a 10G file in about 1-1.5 minutes or
166.66MB/s versus the 8MB/s I was getting before with sequential
uploads. All of these are coming from a single client via boto. This
leads me to think that this is a radosgw issue specifically.  

This again makes me think that this is not a slow disk issue but an
overall radosgw issue. If this were structural in anyway I would think
that all of rados/cephs faculties would be hit and the 8MBps limit per
client would be due to client throttling due to a ceiling being hit.  As
it turns out I am not hitting the ceiling but some other aspect of the
radosgw or boto is limiting my throughput. Is this logic not correct? I
feel like I am missing something.

Thanks for the help everyone!

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com