Re: Upgrade from 0.61.4 to 0.61.6 mon failed. Upgrade to 0.61.7 mon still failed.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26-07-2013 03:49, Keith Phua wrote:
Hi all,

2 days ago, i upgraded one of my mon from 0.61.4 to 0.61.6. The mon failed to start.  I checked the mailing list and found reports of mon failed after upgrading to 0.61.6.  So I wait for the next release and upgraded the failed mon from 0.61.6 to 0.61.7.  My mon still fail to start up.

Keith,

On 0.61.5's release notes [1] one can find a mention that 0.61.4 monitors won't be able to form a quorum with 0.61.5+ monitors. That is due to fixing a bug related to feature bits. If you are to upgrade your monitors to 0.61.5 or above (at this moment we'd recommend 0.61.7), you probably should upgrade all your monitors at once.

  -Joao

[1] - http://ceph.com/docs/master/release-notes/#v0-61-5-cuttlefish


Here is the mon log:

root@atlas3-c1:/var/log/ceph# tail -100 /var/log/ceph/ceph-mon.atlas3-c1.log
2013-07-26 10:45:56.782321 7fa7df837700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-07-26 10:45:56.782329 7fa7df837700  0 -- 172.18.185.73:6789/0 >> 172.18.185.79:6789/0 pipe(0x1c91c80 sd=33 :53442 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-07-26 10:45:58.781375 7fa7e123c700  4 mon.atlas3-c1@0(probing) e4 probe_timeout 0x1c574b0
2013-07-26 10:45:58.781386 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 bootstrap
2013-07-26 10:45:58.781389 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 unregister_cluster_logger - not registered
2013-07-26 10:45:58.781392 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled)
2013-07-26 10:45:58.781395 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_sync
2013-07-26 10:45:58.781398 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset
2013-07-26 10:45:58.781400 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled)
2013-07-26 10:45:58.781402 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 timecheck_finish
2013-07-26 10:45:58.781404 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 scrub_reset
2013-07-26 10:45:58.781411 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled)
2013-07-26 10:45:58.781414 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_probe_timeout 0x1c57440 after 2 seconds
2013-07-26 10:45:58.781424 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 probing other monitors
2013-07-26 10:45:58.781833 7fa7df938700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon
2013-07-26 10:45:58.781853 7fa7e696c700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon
2013-07-26 10:45:58.782037 7fa7dfa39700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon
2013-07-26 10:45:58.782165 7fa7df837700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon
2013-07-26 10:45:58.782171 7fa7df938700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-07-26 10:45:58.782171 7fa7e696c700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-07-26 10:45:58.782177 7fa7df938700  0 -- 172.18.185.73:6789/0 >> 172.18.185.78:6789/0 pipe(0x1c91280 sd=33 :40770 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-07-26 10:45:58.782179 7fa7e696c700  0 -- 172.18.185.73:6789/0 >> 172.18.185.74:6789/0 pipe(0x1c91a00 sd=30 :48828 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-07-26 10:45:58.782399 7fa7dfa39700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-07-26 10:45:58.782418 7fa7dfa39700  0 -- 172.18.185.73:6789/0 >> 172.18.185.77:6789/0 pipe(0x1c91780 sd=32 :44505 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-07-26 10:45:58.782447 7fa7df837700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-07-26 10:45:58.782455 7fa7df837700  0 -- 172.18.185.73:6789/0 >> 172.18.185.79:6789/0 pipe(0x1c91c80 sd=31 :53445 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-07-26 10:46:00.733745 7fa7e123c700 11 mon.atlas3-c1@0(probing) e4 tick
2013-07-26 10:46:00.781471 7fa7e123c700  4 mon.atlas3-c1@0(probing) e4 probe_timeout 0x1c57440
2013-07-26 10:46:00.781479 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 bootstrap
2013-07-26 10:46:00.781481 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 unregister_cluster_logger - not registered
2013-07-26 10:46:00.781483 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled)
2013-07-26 10:46:00.781486 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_sync
2013-07-26 10:46:00.781488 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset
2013-07-26 10:46:00.781490 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled)
2013-07-26 10:46:00.781492 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 timecheck_finish
2013-07-26 10:46:00.781495 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 scrub_reset
2013-07-26 10:46:00.781500 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled)
2013-07-26 10:46:00.781502 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_probe_timeout 0x1c57590 after 2 seconds
2013-07-26 10:46:00.781511 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 probing other monitors
2013-07-26 10:46:00.781984 7fa7dfa39700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon
2013-07-26 10:46:00.782005 7fa7e696c700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon
2013-07-26 10:46:00.782204 7fa7df938700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon
2013-07-26 10:46:00.782326 7fa7df837700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon
2013-07-26 10:46:00.782399 7fa7dfa39700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-07-26 10:46:00.782399 7fa7e696c700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-07-26 10:46:00.782413 7fa7dfa39700  0 -- 172.18.185.73:6789/0 >> 172.18.185.77:6789/0 pipe(0x1c91780 sd=31 :44508 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-07-26 10:46:00.782416 7fa7e696c700  0 -- 172.18.185.73:6789/0 >> 172.18.185.74:6789/0 pipe(0x1c91a00 sd=30 :48835 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-07-26 10:46:00.782491 7fa7df938700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-07-26 10:46:00.782508 7fa7df938700  0 -- 172.18.185.73:6789/0 >> 172.18.185.78:6789/0 pipe(0x1c91280 sd=32 :40772 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-07-26 10:46:00.782598 7fa7df837700  0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption
2013-07-26 10:46:00.782606 7fa7df837700  0 -- 172.18.185.73:6789/0 >> 172.18.185.79:6789/0 pipe(0x1c91c80 sd=33 :53449 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply

Any idea how to fix this? I have a total of 5 mons running so 1 has failed after upgrading.  So the rest is still running 0.61.4 which I don't dare to upgrade.

Thanks.

Keith
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux