Hi all, 2 days ago, i upgraded one of my mon from 0.61.4 to 0.61.6. The mon failed to start. I checked the mailing list and found reports of mon failed after upgrading to 0.61.6. So I wait for the next release and upgraded the failed mon from 0.61.6 to 0.61.7. My mon still fail to start up. Here is the mon log: root@atlas3-c1:/var/log/ceph# tail -100 /var/log/ceph/ceph-mon.atlas3-c1.log 2013-07-26 10:45:56.782321 7fa7df837700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-07-26 10:45:56.782329 7fa7df837700 0 -- 172.18.185.73:6789/0 >> 172.18.185.79:6789/0 pipe(0x1c91c80 sd=33 :53442 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-07-26 10:45:58.781375 7fa7e123c700 4 mon.atlas3-c1@0(probing) e4 probe_timeout 0x1c574b0 2013-07-26 10:45:58.781386 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 bootstrap 2013-07-26 10:45:58.781389 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 unregister_cluster_logger - not registered 2013-07-26 10:45:58.781392 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) 2013-07-26 10:45:58.781395 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_sync 2013-07-26 10:45:58.781398 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset 2013-07-26 10:45:58.781400 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) 2013-07-26 10:45:58.781402 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 timecheck_finish 2013-07-26 10:45:58.781404 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 scrub_reset 2013-07-26 10:45:58.781411 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) 2013-07-26 10:45:58.781414 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_probe_timeout 0x1c57440 after 2 seconds 2013-07-26 10:45:58.781424 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 probing other monitors 2013-07-26 10:45:58.781833 7fa7df938700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon 2013-07-26 10:45:58.781853 7fa7e696c700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon 2013-07-26 10:45:58.782037 7fa7dfa39700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon 2013-07-26 10:45:58.782165 7fa7df837700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon 2013-07-26 10:45:58.782171 7fa7df938700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-07-26 10:45:58.782171 7fa7e696c700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-07-26 10:45:58.782177 7fa7df938700 0 -- 172.18.185.73:6789/0 >> 172.18.185.78:6789/0 pipe(0x1c91280 sd=33 :40770 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-07-26 10:45:58.782179 7fa7e696c700 0 -- 172.18.185.73:6789/0 >> 172.18.185.74:6789/0 pipe(0x1c91a00 sd=30 :48828 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-07-26 10:45:58.782399 7fa7dfa39700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-07-26 10:45:58.782418 7fa7dfa39700 0 -- 172.18.185.73:6789/0 >> 172.18.185.77:6789/0 pipe(0x1c91780 sd=32 :44505 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-07-26 10:45:58.782447 7fa7df837700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-07-26 10:45:58.782455 7fa7df837700 0 -- 172.18.185.73:6789/0 >> 172.18.185.79:6789/0 pipe(0x1c91c80 sd=31 :53445 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-07-26 10:46:00.733745 7fa7e123c700 11 mon.atlas3-c1@0(probing) e4 tick 2013-07-26 10:46:00.781471 7fa7e123c700 4 mon.atlas3-c1@0(probing) e4 probe_timeout 0x1c57440 2013-07-26 10:46:00.781479 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 bootstrap 2013-07-26 10:46:00.781481 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 unregister_cluster_logger - not registered 2013-07-26 10:46:00.781483 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) 2013-07-26 10:46:00.781486 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_sync 2013-07-26 10:46:00.781488 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset 2013-07-26 10:46:00.781490 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) 2013-07-26 10:46:00.781492 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 timecheck_finish 2013-07-26 10:46:00.781495 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 scrub_reset 2013-07-26 10:46:00.781500 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 cancel_probe_timeout (none scheduled) 2013-07-26 10:46:00.781502 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 reset_probe_timeout 0x1c57590 after 2 seconds 2013-07-26 10:46:00.781511 7fa7e123c700 10 mon.atlas3-c1@0(probing) e4 probing other monitors 2013-07-26 10:46:00.781984 7fa7dfa39700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon 2013-07-26 10:46:00.782005 7fa7e696c700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon 2013-07-26 10:46:00.782204 7fa7df938700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon 2013-07-26 10:46:00.782326 7fa7df837700 10 mon.atlas3-c1@0(probing) e4 ms_get_authorizer for mon 2013-07-26 10:46:00.782399 7fa7dfa39700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-07-26 10:46:00.782399 7fa7e696c700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-07-26 10:46:00.782413 7fa7dfa39700 0 -- 172.18.185.73:6789/0 >> 172.18.185.77:6789/0 pipe(0x1c91780 sd=31 :44508 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-07-26 10:46:00.782416 7fa7e696c700 0 -- 172.18.185.73:6789/0 >> 172.18.185.74:6789/0 pipe(0x1c91a00 sd=30 :48835 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-07-26 10:46:00.782491 7fa7df938700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-07-26 10:46:00.782508 7fa7df938700 0 -- 172.18.185.73:6789/0 >> 172.18.185.78:6789/0 pipe(0x1c91280 sd=32 :40772 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply 2013-07-26 10:46:00.782598 7fa7df837700 0 cephx: verify_reply coudln't decrypt with error: error decoding block for decryption 2013-07-26 10:46:00.782606 7fa7df837700 0 -- 172.18.185.73:6789/0 >> 172.18.185.79:6789/0 pipe(0x1c91c80 sd=33 :53449 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply Any idea how to fix this? I have a total of 5 mons running so 1 has failed after upgrading. So the rest is still running 0.61.4 which I don't dare to upgrade. Thanks. Keith _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com