Hi, I'm running a 3-node test cluster on Ubuntu 12.04, without cephx authentication. I started out running 0.47.2 packages (an impatiently-smashed-together backport based on the upstream sources) and then upgraded to 0.48-1ubuntu1 (the packages from quantal rebuilt on precise). So my situation may be a bit special. When I upgraded from 0.47.2 to 0.48, I didn't notice that my first monitor daemon hadn't restarted properly. I rolled through the upgrade and ended up with a system where "ceph -s" would hang, being unable to find a monitor willing to accept responsibility for the cluster. I splashed around rather a lot turning on debug logging. The monitors tended to get as far as 2012-07-17 02:38:52.254856 7f3c3b862780 -1 auth: error reading file: /srv/ceph/mon.leningradskaya/keyring: can't open /srv/ceph/mon.leningradskaya/keyring: (2) No such file or directory 2012-07-17 02:38:52.254874 7f3c3b862780 -1 mon.leningradskaya@-1(probing) e1 unable to load initial keyring /etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin 2012-07-17 02:38:53.006423 7f3c3b860700 1 -- 10.55.200.21:6789/0 >> :/0 pipe(0x7f3c2c0008c0 sd=17 pgs=0 cs=0 l=0).accept sd=17 2012-07-17 02:38:53.231137 7f3c386a1700 1 -- 10.55.200.21:6789/0 >> :/0 pipe(0x7f3c2c000f60 sd=18 pgs=0 cs=0 l=0).accept sd=18 2012-07-17 02:38:53.308857 7f3c3849f700 1 -- 10.55.200.21:6789/0 >> :/0 pipe(0x7f3c2c0015c0 sd=19 pgs=0 cs=0 l=0).accept sd=19 2012-07-17 02:38:53.668990 7f3c3829d700 1 -- 10.55.200.21:6789/0 >> :/0 pipe(0x7f3c2c001c20 sd=20 pgs=0 cs=0 l=0).accept sd=20 with lines like the last four streaming endlessly. Eventually I tried creating an empty /srv/ceph/mon.leningradskaya/keyring and the monitor daemon started right up. When I applied the same change to the rest of the cluster, I was back in business. Here's a log snippet from a successful 0.48 monitor daemon startup: 2012-07-17 02:47:03.036077 7f5f2a66f780 2 auth: KeyRing::load: loaded key file /srv/ceph/mon.leningradskaya/keyring 2012-07-17 02:47:03.036283 7f5f2a66f780 10 mon.leningradskaya@-1(probing) e1 bootstrap 2012-07-17 02:47:03.036319 7f5f2a66f780 10 mon.leningradskaya@-1(probing) e1 unregister_cluster_logger - not registered 2012-07-17 02:47:03.036346 7f5f2a66f780 10 mon.leningradskaya@-1(probing) e1 cancel_probe_timeout (none scheduled) 2012-07-17 02:47:03.036383 7f5f2a66f780 0 mon.leningradskaya@-1(probing) e1 my rank is now 1 (was -1) continuing to log more besides as the cluster came back up. One of my colleagues tried something similar, but his monitor daemons came up like so: 2012-07-19 10:16:10.223092 7f9e20d22780 -1 auth: error reading file: /var/lib/ceph/mon/ceph-a/keyring: can't open /var/lib/ceph/mon/ceph-a/keyring: (2) No such file or directory 2012-07-19 10:16:10.235911 7f9e20d22780 1 mon.a@-1(probing) e1 copying mon. key from old db to external keyring which is a little different -- is this "old db" something I should have ended up with after a regular no-cephx mkcephfs deployment? And also, I ran the various mkcephfs steps individually to avoid having ssh across the whole cluster, so perhaps something fell through the cracks there... Here's my ceph.conf, minus tedious OSD boilerplate: [global] max open files = 131072 log file = /var/log/ceph/$name.log pid file = /run/ceph/$name.pid [mon] mon data = /srv/ceph/$name [mon.prat] host = prat mon addr = 10.55.200.22:6789 [mon.jackass] host = jackass mon addr = 10.55.200.20:6789 [mon.leningradskaya] host = leningradskaya mon addr = 10.55.200.21:6789 Regards, -- Paul Collins Wellington, New Zealand Dag vijandelijk luchtschip de huismeester is dood -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html