Hi Joao, Oh! Yes. Thanks for pointing it out to me. It works after I upgraded all mons to 0.61.7. Keith ----- Original Message ----- From: "Joao Eduardo Luis" <joao.luis@xxxxxxxxxxx> To: ceph-users@xxxxxxxxxxxxxx Sent: Friday, July 26, 2013 10:36:57 PM Subject: Re: Upgrade from 0.61.4 to 0.61.6 mon failed. Upgrade to 0.61.7 mon still failed. On 26-07-2013 03:49, Keith Phua wrote: > Hi all, > > 2 days ago, i upgraded one of my mon from 0.61.4 to 0.61.6. The mon failed to start. I checked the mailing list and found reports of mon failed after upgrading to 0.61.6. So I wait for the next release and upgraded the failed mon from 0.61.6 to 0.61.7. My mon still fail to start up. Keith, On 0.61.5's release notes [1] one can find a mention that 0.61.4 monitors won't be able to form a quorum with 0.61.5+ monitors. That is due to fixing a bug related to feature bits. If you are to upgrade your monitors to 0.61.5 or above (at this moment we'd recommend 0.61.7), you probably should upgrade all your monitors at once. -Joao [1] - http://ceph.com/docs/master/release-notes/#v0-61-5-cuttlefish > > Here is the mon log: > > root@atlas3-c1:/var/log/ceph# tail -100 /var/log/ceph/ceph-mon.atlas3-c1.log > 2013-07-26 10:45:56.782321 7fa7df837700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption > 2013-07-26 10:45:56.782329 7fa7df837700 0 -- 172.18.185.73:6789/0 >> 172.18.185.79:6789/0 pipe(0x1c91c80 sd=33 :53442 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply > 2013-07-26 10:45:58.781375 7fa7e123c700 4 mon.atlas3-c1@0(probing) e4 probe_timeout 0x1c574b0 > 2013-07-26 10:45:58.781386 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 bootstrap > 2013-07-26 10:45:58.781389 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 unregister_cluster_logger - not registered > 2013-07-26 10:45:58.781392 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) > 2013-07-26 10:45:58.781395 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_sync > 2013-07-26 10:45:58.781398 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset > 2013-07-26 10:45:58.781400 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) > 2013-07-26 10:45:58.781402 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 timecheck_finish > 2013-07-26 10:45:58.781404 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 scrub_reset > 2013-07-26 10:45:58.781411 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) > 2013-07-26 10:45:58.781414 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_probe_timeout 0x1c57440 after 2 seconds > 2013-07-26 10:45:58.781424 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 probing other monitors > 2013-07-26 10:45:58.781833 7fa7df938700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon > 2013-07-26 10:45:58.781853 7fa7e696c700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon > 2013-07-26 10:45:58.782037 7fa7dfa39700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon > 2013-07-26 10:45:58.782165 7fa7df837700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon > 2013-07-26 10:45:58.782171 7fa7df938700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption > 2013-07-26 10:45:58.782171 7fa7e696c700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption > 2013-07-26 10:45:58.782177 7fa7df938700 0 -- 172.18.185.73:6789/0 >> 172.18.185.78:6789/0 pipe(0x1c91280 sd=33 :40770 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply > 2013-07-26 10:45:58.782179 7fa7e696c700 0 -- 172.18.185.73:6789/0 >> 172.18.185.74:6789/0 pipe(0x1c91a00 sd=30 :48828 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply > 2013-07-26 10:45:58.782399 7fa7dfa39700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption > 2013-07-26 10:45:58.782418 7fa7dfa39700 0 -- 172.18.185.73:6789/0 >> 172.18.185.77:6789/0 pipe(0x1c91780 sd=32 :44505 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply > 2013-07-26 10:45:58.782447 7fa7df837700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption > 2013-07-26 10:45:58.782455 7fa7df837700 0 -- 172.18.185.73:6789/0 >> 172.18.185.79:6789/0 pipe(0x1c91c80 sd=31 :53445 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply > 2013-07-26 10:46:00.733745 7fa7e123c700 11 mon.atlas3-c1@0(probing) e4 tick > 2013-07-26 10:46:00.781471 7fa7e123c700 4 mon.atlas3-c1@0(probing) e4 probe_timeout 0x1c57440 > 2013-07-26 10:46:00.781479 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 bootstrap > 2013-07-26 10:46:00.781481 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 unregister_cluster_logger - not registered > 2013-07-26 10:46:00.781483 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) > 2013-07-26 10:46:00.781486 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_sync > 2013-07-26 10:46:00.781488 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset > 2013-07-26 10:46:00.781490 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) > 2013-07-26 10:46:00.781492 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 timecheck_finish > 2013-07-26 10:46:00.781495 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 scrub_reset > 2013-07-26 10:46:00.781500 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) > 2013-07-26 10:46:00.781502 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_probe_timeout 0x1c57590 after 2 seconds > 2013-07-26 10:46:00.781511 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 probing other monitors > 2013-07-26 10:46:00.781984 7fa7dfa39700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon > 2013-07-26 10:46:00.782005 7fa7e696c700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon > 2013-07-26 10:46:00.782204 7fa7df938700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon > 2013-07-26 10:46:00.782326 7fa7df837700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon > 2013-07-26 10:46:00.782399 7fa7dfa39700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption > 2013-07-26 10:46:00.782399 7fa7e696c700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption > 2013-07-26 10:46:00.782413 7fa7dfa39700 0 -- 172.18.185.73:6789/0 >> 172.18.185.77:6789/0 pipe(0x1c91780 sd=31 :44508 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply > 2013-07-26 10:46:00.782416 7fa7e696c700 0 -- 172.18.185.73:6789/0 >> 172.18.185.74:6789/0 pipe(0x1c91a00 sd=30 :48835 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply > 2013-07-26 10:46:00.782491 7fa7df938700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption > 2013-07-26 10:46:00.782508 7fa7df938700 0 -- 172.18.185.73:6789/0 >> 172.18.185.78:6789/0 pipe(0x1c91280 sd=32 :40772 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply > 2013-07-26 10:46:00.782598 7fa7df837700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption > 2013-07-26 10:46:00.782606 7fa7df837700 0 -- 172.18.185.73:6789/0 >> 172.18.185.79:6789/0 pipe(0x1c91c80 sd=33 :53449 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply > > Any idea how to fix this? I have a total of 5 mons running so 1 has failed after upgrading. So the rest is still running 0.61.4 which I don't dare to upgrade. > > Thanks. > > Keith > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Joao Eduardo Luis Software Engineer | http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com