Re: Deadly slow Ceph cluster revisited

Shane Gibson <Shane_Gibson@xxxxxxxxxxxx> · Fri, 17 Jul 2015 09:07:50 -0700

David - I'm new to Ceph myself, so can't point out any smoking guns - but
your problem "feels" like a network issue.  I suggest you check all of
your OSD/Mon/Clients network interfaces.  Check for errors, check that
they are negotiating the same link speed/type with your switches (if you
have LLDP enabled on your switches, this will help), most importantly -
check that you have MTU matching - I.E. - if you are using Jumbo frames
(eg 9000 MTU) on your hosts, that your switches are *also* supporting that
with appropriate packet overhead (eg 9128).  If you have your hosts set to
9000 - and your switches to 1500 - you'll see this exact behavior...

Hopefully that helps some ...

~~shane 

On 7/17/15, 8:57 AM, "ceph-users on behalf of J David"
<ceph-users-bounces@xxxxxxxxxxxxxx on behalf of j.david.lists@xxxxxxxxx>
wrote:

>On Fri, Jul 17, 2015 at 11:15 AM, Quentin Hartman
><qhartman@xxxxxxxxxxxxxxxxxxx> wrote:
>> That looks a lot like what I was seeing initially. The OSDs getting
>>marked
>> out was relatively rare and it took a bit before I saw it.
>
>Our problem is "most of the time" and does not appear confined to a
>specific ceph cluster node or OSD:
>
>$ sudo fgrep 'waiting for subops' ceph.log  | sed -e 's/.* v4 //' |
>sort | uniq -c | sort -n
>      1 currently waiting for subops from 0
>      1 currently waiting for subops from 10
>      1 currently waiting for subops from 11
>      1 currently waiting for subops from 12
>      1 currently waiting for subops from 3
>      1 currently waiting for subops from 7
>      2 currently waiting for subops from 13
>      2 currently waiting for subops from 16
>      2 currently waiting for subops from 4
>      3 currently waiting for subops from 15
>      4 currently waiting for subops from 6
>      4 currently waiting for subops from 8
>      7 currently waiting for subops from 2
>
>Node f16: 0, 2, and 3 (3 out of 4)
>Node f17: 4, 6, 7, 8, 10, 11, 12, 13 and 15 (9 out of 12)
>Node f18: 16 (1 out of 12)
>
>So f18 seems like the odd man out, in that it has *less* problems than
>the other two.
>
>There are a grand total of 2 RX errors across all the interfaces on
>all three machines. (Each one has dual 10G interfaces bonded together
>as active/failover.)
>
>The OSD log for the worst offender above (2) says:
>
>2015-07-17 08:52:05.441607 7f562ea0c700  0 log [WRN] : 1 slow
>requests, 1 included below; oldest blocked for > 30.119568 secs
>
>2015-07-17 08:52:05.441622 7f562ea0c700  0 log [WRN] : slow request
>30.119568 seconds old, received at 2015-07-17 08:51:35.321991:
>osd_sub_op(client.32913524.0:3149584 2.249
>2792c249/rbd_data.15322ae8944a.000000000011b487/head//2 [] v
>10705'944603 snapset=0=[]:[] snapc=0=[]) v11 currently started
>
>2015-07-17 08:52:43.229770 7f560833f700  0 --
>192.168.2.216:6813/16029552 >> 192.168.2.218:6810/7028653
>pipe(0x25265180 sd=25 :6813 s=2 pgs=23894 cs=41 l=0
>c=0x22be4c60).fault with nothing to send, going to standby
>
>There are a bunch of those "fault with nothing to send, going to
>standby" messages.
>
>> The messages were like "So-and-so incorrectly marked us
>> out" IIRC.
>
>Nothing like that. Nor, with "ceph -w" running constantly, any
>reference to anything being marked out at any point, even when
>problems are severe.
>
>Thanks!
>_______________________________________________
>ceph-users mailing list
>ceph-users@xxxxxxxxxxxxxx
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com