Hello, [This is hammer, 0.94.9, since proxmox waits for the new jewel release due to some relevant fixes.] This is possibly some network issue, but I cannot see the indicator about what to see. mon0 usually stands in quorum alone, and other mons cannot join. They get the monmap, they intend to join, but it just never happens, mons get from synchronising to probing, forever. Raising log level doesn't reveal anything to me. cluster network and public network differs, and mons are supposed to be on the public network. mon0: 2016-11-23 16:26:16.920691 7f8f193da700 1 mon.0@0(leader) e1 adding peer 10.75.13.132:6789/0 to list of hints 2016-11-23 16:26:18.922057 7f8f193da700 1 mon.0@0(leader) e1 adding peer 10.75.13.132:6789/0 to list of hints 2016-11-23 16:26:20.923695 7f8f193da700 1 mon.0@0(leader) e1 adding peer 10.75.13.132:6789/0 to list of hints 2016-11-23 16:26:22.925172 7f8f193da700 1 mon.0@0(leader) e1 adding peer 10.75.13.132:6789/0 to list of hints ...forever mon1: 2016-11-23 16:25:14.887453 7fe81a87f880 0 ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90), process ceph-mon, pid 8956 2016-11-23 16:25:14.909873 7fe81a87f880 0 mon.1 does not exist in monmap, will attempt to join an existing cluster 2016-11-23 16:25:14.910934 7fe81a87f880 0 using public_addr 10.75.13.132:0/0 -> 10.75.13.132:6789/0 2016-11-23 16:25:14.911012 7fe81a87f880 0 starting mon.1 rank -1 at 10.75.13.132:6789/0 mon_data /var/lib/ceph/mon/ceph-1 fsid ca404f45-def3-4c22-a83b-7939e3f92514 2016-11-23 16:25:14.911406 7fe81a87f880 1 mon.1@-1(probing) e0 preinit fsid ca404f45-def3-4c22-a83b-7939e3f92514 2016-11-23 16:26:14.912255 7fe8137f8700 0 mon.1@-1(synchronizing).data_health(0) update_stats avail 92% total 69923 MB, used 1552 MB, avail 64796 MB 2016-11-23 16:27:14.912613 7fe8137f8700 0 mon.1@-1(probing).data_health(0) update_stats avail 92% total 69923 MB, used 1552 MB, avail 64796 MB 2016-11-23 16:28:14.912868 7fe8137f8700 0 mon.1@-1(probing).data_health(0) update_stats avail 92% total 69923 MB, used 1552 MB, avail 64796 MB ...forever as well. Raising on mon0: 2016-11-23 17:19:11.330786 7f8f20c60700 10 _calc_signature seq 366 front_crc_ = 1411686358 middle_crc = 0 data_crc = 0 sig = 16063324873821844002 2016-11-23 17:19:11.330928 7f8f193da700 20 mon.0@0(leader) e1 have connection 2016-11-23 17:19:11.330937 7f8f193da700 20 mon.0@0(leader) e1 ms_dispatch existing session MonSession: mon.? 10.75.13.132:6789/0 is openallow * for mon.? 10.75.13.132:6789/0 2016-11-23 17:19:11.330947 7f8f193da700 20 mon.0@0(leader) e1 caps allow * 2016-11-23 17:19:11.330953 7f8f193da700 20 is_capable service=mon command= read on cap allow * 2016-11-23 17:19:11.330956 7f8f193da700 20 allow so far , doing grant allow * 2016-11-23 17:19:11.330958 7f8f193da700 20 allow all 2016-11-23 17:19:11.330961 7f8f193da700 10 mon.0@0(leader) e1 handle_probe mon_probe(probe ca404f45-def3-4c22-a83b-7939e3f92514 name 1 new) v6 2016-11-23 17:19:11.330969 7f8f193da700 10 mon.0@0(leader) e1 handle_probe_probe mon.? 10.75.13.132:6789/0mon_probe(probe ca404f45-def3-4c22-a83b-7939e3f92514 name 1 new) v6 features 55169095435288575 2016-11-23 17:19:11.331009 7f8f193da700 1 mon.0@0(leader) e1 adding peer 10.75.13.132:6789/0 to list of hints 2016-11-23 17:19:11.331129 7f8f173d6700 10 _calc_signature seq 442670678 front_crc_ = 1084090475 middle_crc = 0 data_crc = 0 sig = 15627235992780641097 2016-11-23 17:19:11.331164 7f8f173d6700 20 Putting signature in client message(seq # 442670678): sig = 15627235992780641097 2016-11-23 17:19:13.344756 7f8f20c60700 10 _calc_signature seq 367 front_crc_ = 1411686358 middle_crc = 0 data_crc = 0 sig = 10295634500541529978 2016-11-23 17:19:13.344931 7f8f193da700 20 mon.0@0(leader) e1 have connection 2016-11-23 17:19:13.344940 7f8f193da700 20 mon.0@0(leader) e1 ms_dispatch existing session MonSession: mon.? 10.75.13.132:6789/0 is openallow * for mon.? 10.75.13.132:6789/0 2016-11-23 17:19:13.344952 7f8f193da700 20 mon.0@0(leader) e1 caps allow * 2016-11-23 17:19:13.344959 7f8f193da700 20 is_capable service=mon command= read on cap allow * 2016-11-23 17:19:13.344962 7f8f193da700 20 allow so far , doing grant allow * 2016-11-23 17:19:13.344964 7f8f193da700 20 allow all 2016-11-23 17:19:13.344967 7f8f193da700 10 mon.0@0(leader) e1 handle_probe mon_probe(probe ca404f45-def3-4c22-a83b-7939e3f92514 name 1 new) v6 2016-11-23 17:19:13.344975 7f8f193da700 10 mon.0@0(leader) e1 handle_probe_probe mon.? 10.75.13.132:6789/0mon_probe(probe ca404f45-def3-4c22-a83b-7939e3f92514 name 1 new) v6 features 55169095435288575 2016-11-23 17:19:13.345019 7f8f193da700 1 mon.0@0(leader) e1 adding peer 10.75.13.132:6789/0 to list of hints mon1 sometimes says like: 2016-11-23 17:06:04.241491 7f7c3f855700 0 -- 10.75.13.132:6789/0 >> 10.75.13.131:6789/0 pipe(0x3ae4000 sd=13 :53558 s=2 pgs=106 cs=1 l=0 c=0x3937600).reader missed message? skipped from seq 0 to 64927996 2016-11-23 17:06:04.241620 7f7c41859700 0 mon.1@1(probing) e1 my rank is now -1 (was 1) 2016-11-23 17:06:04.242622 7f7c3f855700 0 -- 10.75.13.132:6789/0 >> 10.75.13.131:6789/0 pipe(0x3ae4000 sd=22 :6789 s=0 pgs=0 cs=0 l=0 c=0x3938260).accept connect_seq 2 vs existing 0 state connecting 2016-11-23 17:06:04.242633 7f7c3f855700 0 -- 10.75.13.132:6789/0 >> 10.75.13.131:6789/0 pipe(0x3ae4000 sd=22 :6789 s=0 pgs=0 cs=0 l=0 c=0x3938260).accept we reset (peer sent cseq 2, 0x3ae9000.cseq = 0), sending RESETSESSION 2016-11-23 17:06:04.243404 7f7c3f855700 0 -- 10.75.13.132:6789/0 >> 10.75.13.131:6789/0 pipe(0x3ae9000 sd=13 :53560 s=2 pgs=108 cs=1 l=0 c=0x3937e40).reader missed message? skipped from seq 0 to 442670313 but I can't tell if that's a problem or just business as usual. I've tried various things, trying forcing quorum or resync with no sucecss. The network is connected through a bridge with a bonded (etherchannel) interface, but ping works well. All of them Debian linux (based). As far as I see it shouldn't use multicast, so no mc related problems should come into the equation. All the system was purged several times on various levels. It is supposed to work, as it's exactly the same as other configs. I'm kind of out of ideas where to see. Thanks, Peter _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com