Re: new mon can't join new cluster, probe_timeout / probing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Replying myself.

On Wed, 23 Nov 2016 18:50:02 +0100
grin <grin@xxxxxxx> wrote:

> This is possibly some network issue, but I cannot see the indicator
> about what to see. mon0 usually stands in quorum alone, and other mons
> cannot join. They get the monmap, they intend to join, but it just
> never happens, mons get from synchronising to probing, forever.
> Raising log level doesn't reveal anything to me.

Spoiler alert: it was an MTU problem. 

A crappy switch (cisco nexus, by courtesy of Satan) actually wasn't able
to handle its designated MTU size so the cluster members pushed
larger-than-switch-reality-of-mtu packets, some of which obviously have
failed to deliver. After lowering the MTU on the nodes to the level of
the crappy switch everything came alive.

I just want to sidenote that I was absolutely unable to see that there
were messages lost, and from where to where and what. I have seen no
indicator on the initiator side that a message has been sent but never
acknowledged(?) or acted upon; and I have no obvious sign that the
destination have expected a message which was missing. 

This maybe has been helped in the later versions, but if it wasn't,
maybe it ought to be.

Thanks,
Peter 
(older by a day spent on that)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux