Re: ceph-deploy, single mon not in quorum

Mordur Ingolfsson <rass@xxxxxxx> · Thu, 09 Jan 2014 11:15:33 -0500

    Hi guys, thank you very much for your
      feedback.  I'm new to Ceph, so I ask you to be patient with my
      newbie-ness.

      I'm dealing with the same issue although I'm not using
      ceph-deploy. I installed manually (for learning purposes) a small
      test cluster of three nodes, one to host the single mon and two
      for osd. I had managed to get this working, all seemed healthy. I
      then simulated a catastrophic event by pulling the plug on all
      three nodes. After that I haven't been able to get things working.
      There is no quorum reached on a single mon setup and a
      ceph-create-keys process is hanging hanging. This is my ceph.conf

      This is my ceph.conf

      http://pastebin.com/qyqeu5E4

      This is what a process list pertaining to ceph looks like on the
      mon node after a reboot, please note that the ceph-create-keys
      hangs:

      root@ceph0:/var/log/ceph# ps aux | grep ceph

      root       988  0.2  0.2  34204  7368 ?        S    15:36   0:00
      /usr/bin/python /usr/sbin/ceph-create-keys -i cehp0

      root      1449  0.0  0.1  94844  3972 ?        Ss   15:38   0:00
      sshd: ceph [priv]   

      ceph      1470  0.0  0.0  94844  1740 ?        S    15:38   0:00
      sshd: ceph@pts/0    

      ceph      1471  0.3  0.1  22308  3384 pts/0    Ss   15:38   0:00
      -bash

      root      1670  0.0  0.0   9452   904 pts/0    R+   15:38   0:00
      grep --color=auto ceph

      So as you can see, no mon  process is started, I presume that this
      is somehow a result of the ceph-create-keys process hanging. 
      /var/log/ceph-mon.cehp0.log shows the following in this status of
      the system, after a reboot:

      2014-01-09 15:49:44.433943 7f9e45eb97c0  0 ceph version 0.72.2
      (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid
      972

      2014-01-09 15:49:44.535436 7f9e45eb97c0 -1 failed to create new
      leveldb store

      If I manually start the ceph process by:

      start ceph-mon id=ceph0

      it starts fine, and "ceph
      --admin-daemon=/var/run/ceph/ceph-mon.ceph0.asok  mon_status"
      outputs:

      { "name": "ceph0",

        "rank": 0,

        "state": "leader",

        "election_epoch": 1,

        "quorum": [

              0],

        "outside_quorum": [],

        "extra_probe_peers": [],

        "sync_provider": [],

        "monmap": { "epoch": 1,

            "fsid": "e0696edf-ac8d-4095-beaf-6a2592964060",

            "modified": "2014-01-08 02:00:23.264895",

            "created": "2014-01-08 02:00:23.264895",

            "mons": [

                  { "rank": 0,

                    "name": "ceph0",

                    "addr": "192.168.10.200:6789\/0"}]}}

      The mon process seems ok, but the ceph-create-keys keeps hanging
      and there is no quorum.   

      If I kill the ceph-create-keys process and  run "/usr/bin/python
      /usr/sbin/ceph-create-keys -i cehp0" manually i get: 

      "admin_socket: exception getting command descriptions: [Errno 2]
      No such file or directory

      INFO:ceph-create-keys:ceph-mon admin socket not ready yet." 

      every second or so. This is what happens when I terminate the
      manually started ceph-create-keys process:

      ^CTraceback (most recent call last):

        File "/usr/sbin/ceph-create-keys", line 227, in <module>

          main()

        File "/usr/sbin/ceph-create-keys", line 213, in main

          wait_for_quorum(cluster=args.cluster, mon_id=args.id)

        File "/usr/sbin/ceph-create-keys", line 34, in wait_for_quorum

          time.sleep(1)

      KeyboardInterrupt

      I will finish this long post by pasting what happens if I try to
      restart all services on the cluster, just so you know that the mon
      problem is only the first problem I'm battling with here :) 

      http://pastebin.com/mPGhiYu5

      Please note, that after the above global restart, the
      ceph-create-keys hanging process is back.

      Best,

      Moe

      On 01/09/2014 09:51 AM, Travis Rhoden wrote:

      On Thu, Jan 9, 2014 at 9:48 AM, Alfredo Deza <alfredo.deza@xxxxxxxxxxx> wrote:

        On Thu, Jan 9, 2014 at 9:45 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:

          HI Mordur,

I'm definitely straining my memory on this one, but happy to help if I can?

I'm pretty sure I did not figure it out -- you can see I didn't get
any feedback from the list.  What I did do, however, was uninstall
everything and try the same setup with mkcephfs, which worked fine at
the time.  This was 8 months ago, though, and I have since used
ceph-deploy many times with great success.  I am not sure if I have
ever tried a similar set up, though, with just one node and one
monitor.  Fortuitiously, I may be trying that very setup today or
tomorrow.  If I still have issues, I will be sure to post them here.

Are you using both the latest ceph-deploy and the latest Ceph packages
(Emperor or newer dev packages)?  There have been lots of changes in
the monitor area, including in the upstart scripts, that made many
things more robust in this area.  I did have a cluster a few months
ago that had a flaky monitor that refused to join quorum after
install, and I had to just blow it away and re-install/deploy it and
then it was fine, which I thought was odd.

Sorry that's probably not much help.

 - Travis

On Thu, Jan 9, 2014 at 12:40 AM, Mordur Ingolfsson <rass@xxxxxxx> wrote:

            Hi Travis,

Did you figure this out? I'm dealing with exactly the same thing over here.

        Can you share what exactly you are having problems with? ceph-deploy's
log output has been
much improved and it is super useful to have that when dealing with
possible issues.

      I do not, it was long long ago...  And it case it was ambiguous, let
me explicitly say I was not recommending the use of mkcephfs at all
(is that even still possible?).  ceph-deploy is certainly the tool to
use.

            Best,
Moe
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

          _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com