Hi, before I get to my questions, I want to thank for the good work done with ceph. I learned about ceph in an Admin-Magazin article [1] and was supprised how easy it was to setup ceph by following the article. Trying new software and not hitting any error/warning or other problems is a very rare incident and I was verry impressed by the easy installation and configuration. Later on I had some smaler problems as i tried to increase the number of mon, ods an by adding an standby mds. But i managed to figure it out using manpages and the web. Now I have a problem that I don't know how to fix. First some informations about my setup ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372) ceph.conf ------------ [global] ; Enable authentication between hosts within the cluster. auth supported = cephx keyring = /etc/ceph/$name.keyring [mon] mon data = /srv/mon.$id [mds] [osd] osd data = /srv/osd.$id osd journal = /srv/osd.$id.journal osd journal size = 1000 [mon.b] host = hpb020102 mon addr = 10.23.3.2:6789 [mon.c] host = hpb020103 mon addr = 10.23.3.3:6789 [mon.d] host = hpb020104 mon addr = 10.23.3.4:6789 [mon.e] host = hpb020105 mon addr = 10.23.3.5:6789 [mon.f] host = hpb020106 mon addr = 10.23.3.6:6789 [osd.2] host = hpb020102 [osd.3] host = hpb020103 [osd.4] host = hpb020104 [osd.5] host = hpb020105 [osd.6] host = hpb020106 [mds.a] host = hpb020104 [mds.b] host = hpb020105 mds standby replay = true ------------ /srv/osd.* are on xfs partition Befor my holiday I found logs that indicated that there might be a problem with one of my mds which is still present2013-01-14 15:32:41.943304 mds e515692: 1/1/1 up {0=a=up:active}, 1 up:standby-replay, 5 up:oneshot-replay(laggy or crashed)
I tried to increase the log-level and get some debug infos. After my holiday i found that the ceph-logs mostly the mon log had filled my / filesystem. First I thougth that the debugging was still active but at a closer look, I found that somehow the mon. key could not be found by mon.e2013-01-14 15:44:52.007632 7fad1e728700 0 mon.e@3(probing) e3 couldn't get secret for mon service 2013-01-14 15:44:52.007655 7fad17ee9700 0 mon.e@3(probing) e3 couldn't get secret for mon service 2013-01-14 15:44:52.007659 7fad1e728700 0 mon.e@3(probing) e3 no installed auth entries! 2013-01-14 15:44:52.007662 7fad17ee9700 0 mon.e@3(probing) e3 no installed auth entries! 2013-01-14 15:44:52.007860 7fad17ee9700 0 -- 10.23.3.5:6789/0 >> 10.23.3.3:6789/0 pipe(0x8e7190 sd=19 pgs=0 cs=0 l=0
).connect got BADAUTHORIZER2013-01-14 15:44:52.007860 7fad1e728700 0 -- 10.23.3.5:6789/0 >> 10.23.3.2:6789/0 pipe(0x8e6870 sd=18 pgs=0 cs=0 l=0
).connect got BADAUTHORIZER So i guess, by trying to get some more informations I somehow manged to delete the mon. key. I was unable the retieve the history because of the full filesystem. So I tried to use "ceph auth" and ceph-authtool to (re-)add the mon. key but only managed that mon.d is now too unable the authenticate. Sofar I know that I don't understand how cephx is working. "ceph auth list" shows the same key for mon. on all servers. But as it takes longer on hpb020104 and hpb020105 I guess it will contact some other mon servers as mon.d and mon.e are out of quorum. How can i get informations about the mon. key for mon.d and mon.e if they are not running / out of quorum? How can I add/change the mon. key? "/etc/ceph/" has keyrings for admin client.admin mds.* ods.* but none for mon. or mon.* Is this correct? Best regards Michael Menge[1] http://www.admin-magazin.de/Das-Heft/2012/03/Der-RADOS-Objectstore-und-Ceph-Teil-1/%28language%29/ger-DE
-------------------------------------------------------------------------------- M.Menge Tel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912Zentrum für Datenverarbeitung mail: michael.menge@xxxxxxxxxxxxxxxxxxxx
Wächterstraße 76 72074 Tübingen
Attachment:
smime.p7s
Description: S/MIME Signatur