Hmm. This sounds very similar to the problem I reported (with debug-mon = 20 and debug ms = 1 logs as of today) on our support site (ticket #438) - Sage, please take a look. On Mon, Aug 12, 2013 at 9:49 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Mon, 12 Aug 2013, Jeppesen, Nelson wrote: >> Joao, >> >> (log file uploaded to http://pastebin.com/Ufrxn6fZ) >> >> I had some good luck and some bad luck. I copied the store.db to a new monitor, injected a modified monmap and started it up (This is all on the same host.) Very quickly it reached quorum (as far as I can tell) but didn't respond. Running 'ceph -w' just hung, no timeouts or errors. Same thing when restarting an OSD. >> >> The last lines of the log file '...ms_verify_authorizer..' are from 'ceph -w' attempts. >> >> I restarted everything again and it sat there synchronizing. IO stat reported about 100MB/s, but just reads. I let it sit there for 7 min but nothing happened. > > Can you do this again with --debug-mon 20 --debug-ms 1? It looks as > though the main dispatch thread is blocked (7f71a1aa5700 does nothing > after winning the election). It would also be helpful to gdb attach to > the running ceph-mon and capture the output from 'thread apply all bt'. > >> Side question, how long can a ceph cluster run without a monitor? I was >> able to upload files via rados gateway without issue even when the >> monitor was down. > > Quite a while, as long as no new processes need to authenticate, and no > nodes go up or down. Eventually the authentication keys are going to time > out, though (1 hour is the default). > > sage > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com