Re: [ceph-users] ceph 0.59 cephx problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Re-CC'ing the list)

On 03/22/2013 01:36 PM, Steffen Thorhauer wrote:
I was upgrading from 0.58 to ceph version 0.59 (cbae6a435c62899f857775f66659de052fb0e759)
Upgrading from 0.57 to 0.58 was an easy one, so I was suprised with the problems

v0.59 is the first dev release with a major monitor rework. We've tested it thoroughly over the past weeks, but different usages tend to trigger different behaviours, so you might just have hit one of those buggers.

It seems to me, that I make an fatal error, that I dont understand.
I had 5 working mons (mon.{0-4]). After the upgrade of the first node I
lost the mon.4 with the cephx error. Then I upgraded all of the nodes and
I lost the mon.0 with the starting error.

The v0.59 monitors is unable to communicate with the <=0.58 monitors, so that's likely why the monitor appeared to be lost: you would need at least a majority of monitors on v0.59 so they could form a quorum.

After some restarts it looks like the other mons lost any quorum
so ceph -s or any kind of ceph commands didn't work anymore.

As long as you have a majority of monitors running v0.59, they ought to be able to form a quorum. If they didn't, then something weird must have happened and logs would be much appreciated!

So I made today the decision to reinstall the test "cluster".

You decided to go back to v0.58, is that it? Regardless, if you have logs that could provide some insight into what happened, we'd really appreciate it.

Thanks!

  -Joao


-Steffen

Btw. ceph rbd, adding/removing osds works great.

On Fri, Mar 22, 2013 at 10:01:10AM +0000, Joao Eduardo Luis wrote:
On 03/21/2013 03:47 PM, Steffen Thorhauer wrote:
I think, I was impatient and should wait for the v.59 announcement. It
seems I should upgrading all monitors.
  After upgrading all nodes I have on 2 monitors errors like:
=== mon.0 ===
Starting Ceph mon.0 on u124-161-ceph...
mon fs missing 'monmap/latest' and 'mkfs/monmap'
failed: 'ulimit -n 8192;  /usr/bin/ceph-mon -i 0 --pid-file
/var/run/ceph/mon.0.pid -c /etc/ceph/ceph.conf '

Steffen

Which version are you upgrading from?

Also, could you provide us with some logs of those monitors with 'debug
mon = 20' ?

   -Joao



On 03/21/2013 02:22 PM, Steffen Thorhauer wrote:
Hi,
I just upgraded one node of my ceph "cluster". I wanted upgrade node
after node.
osd on this node  has no problem. but the mon (mon.4) has
authorization problems.
I did'nt change any config, just made an  apt-get upgrade .
ceph -s
   health HEALTH_WARN 1 mons down, quorum 0,1,2,3 0,1,2,3
   monmap e2: 5 mons at
{0=10.37.124.161:6789/0,1=10.37.124.162:6789/0,2=10.37.124.163:6789/0,3=10.37.124.164:6789/0,4=10.37.124.167:6789/0},
election epoch 162, quorum 0,1,2,3 0,1,2,3
   osdmap e4839: 16 osds: 16 up, 16 in
    pgmap v195213: 3144 pgs: 3144 active+clean; 255 GB data, 820 GB
used, 778 GB / 1599 GB avail
   mdsmap e54723: 1/1/1 up {0=0=up:active}, 3 up:standby


but the mon.4 log file look like:

2013-03-21 12:45:15.701747 7f45412c6780  2 mon.4@-1(probing) e2 init
2013-03-21 12:45:15.702051 7f45412c6780 10 mon.4@-1(probing) e2 bootstrap
2013-03-21 12:45:15.702094 7f45412c6780 10 mon.4@-1(probing) e2
unregister_cluster_logger - not registered
2013-03-21 12:45:15.702121 7f45412c6780 10 mon.4@-1(probing) e2
cancel_probe_timeout (none scheduled)
2013-03-21 12:45:15.702147 7f45412c6780  0 mon.4@-1(probing) e2 my
rank is now 4 (was -1)
2013-03-21 12:45:15.702190 7f45412c6780 10 mon.4@4(probing) e2 reset_sync
2013-03-21 12:45:15.702213 7f45412c6780 10 mon.4@4(probing) e2 reset
2013-03-21 12:45:15.702238 7f45412c6780 10 mon.4@4(probing) e2
timecheck_finish
2013-03-21 12:45:15.702286 7f45412c6780 10 mon.4@4(probing) e2
cancel_probe_timeout (none scheduled)
2013-03-21 12:45:15.702312 7f45412c6780 10 mon.4@4(probing) e2
reset_probe_timeout 0x24d6580 after 2 seconds
2013-03-21 12:45:15.702387 7f45412c6780 10 mon.4@4(probing) e2 probing
other monitors
2013-03-21 12:45:15.703459 7f453a15f700 10 mon.4@4(probing) e2
ms_get_authorizer for mon
2013-03-21 12:45:15.703641 7f453a15f700 10 cephx: build_service_ticket
service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
2013-03-21 12:45:15.703642 7f453a361700 10 mon.4@4(probing) e2
ms_get_authorizer for mon
2013-03-21 12:45:15.703694 7f453a361700 10 cephx: build_service_ticket
service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
2013-03-21 12:45:15.703869 7f453a260700 10 mon.4@4(probing) e2
ms_get_authorizer for mon
2013-03-21 12:45:15.703957 7f453a260700 10 cephx: build_service_ticket
service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
2013-03-21 12:45:15.704244 7f453a05e700 10 mon.4@4(probing) e2
ms_get_authorizer for mon
2013-03-21 12:45:15.704306 7f453a05e700 10 cephx: build_service_ticket
service mon secret_id 18446744073709551615 ticket_info.ticket.name=mon.
2013-03-21 12:45:15.704323 7f453a361700  0 cephx: verify_reply
coudln't decrypt with error: error decoding block for decryption
2013-03-21 12:45:15.704333 7f453a361700  0 -- 10.37.124.167:6789/0 >>
10.37.124.161:6789/0 pipe(0x24f3c80 sd=29 :42310 s=1 pgs=0 cs=0
l=0).failed verifying authorize reply
2013-03-21 12:45:15.704404 7f453a361700  0 -- 10.37.124.167:6789/0 >>
10.37.124.161:6789/0 pipe(0x24f3c80 sd=29 :42310 s=1 pgs=0 cs=0
l=0).fault
2013-03-21 12:45:15.704429 7f453a15f700  0 cephx: verify_reply
coudln't decrypt with error: error decoding block for decryption
2013-03-21 12:45:15.704483 7f453a15f700  0 -- 10.37.124.167:6789/0 >>
10.37.124.163:6789/0 pipe(0x24f3500 sd=31 :60255 s=1 pgs=0 cs=0
l=0).failed verifying authorize reply
2013-03-21 12:45:15.704517 7f453a260700  0 cephx: verify_reply
coudln't decrypt with error: error decoding block for decryption
2013-03-21 12:45:15.704578 7f453a15f700  0 -- 10.37.124.167:6789/0 >>
10.37.124.163:6789/0 pipe(0x24f3500 sd=31 :60255 s=1 pgs=0 cs=0
l=0).fault
2013-03-21 12:45:15.704529 7f453a260700  0 -- 10.37.124.167:6789/0 >>
10.37.124.162:6789/0 pipe(0x24f3a00 sd=30 :55445 s=1 pgs=0 cs=0
l=0).failed verifying authorize reply

What now??

Regards,
  Steffen


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux