Replying myself. On Wed, 23 Nov 2016 18:50:02 +0100 grin <grin@xxxxxxx> wrote: > This is possibly some network issue, but I cannot see the indicator > about what to see. mon0 usually stands in quorum alone, and other mons > cannot join. They get the monmap, they intend to join, but it just > never happens, mons get from synchronising to probing, forever. > Raising log level doesn't reveal anything to me. Spoiler alert: it was an MTU problem. A crappy switch (cisco nexus, by courtesy of Satan) actually wasn't able to handle its designated MTU size so the cluster members pushed larger-than-switch-reality-of-mtu packets, some of which obviously have failed to deliver. After lowering the MTU on the nodes to the level of the crappy switch everything came alive. I just want to sidenote that I was absolutely unable to see that there were messages lost, and from where to where and what. I have seen no indicator on the initiator side that a message has been sent but never acknowledged(?) or acted upon; and I have no obvious sign that the destination have expected a message which was missing. This maybe has been helped in the later versions, but if it wasn't, maybe it ought to be. Thanks, Peter (older by a day spent on that) _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com