Re: ceph-deploy, single mon not in quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys, thank you very much for your feedback.  I'm new to Ceph, so I ask you to be patient with my newbie-ness.

I'm dealing with the same issue although I'm not using ceph-deploy. I installed manually (for learning purposes) a small test cluster of three nodes, one to host the single mon and two for osd. I had managed to get this working, all seemed healthy. I then simulated a catastrophic event by pulling the plug on all three nodes. After that I haven't been able to get things working. There is no quorum reached on a single mon setup and a ceph-create-keys process is hanging hanging. This is my ceph.conf


This is my ceph.conf

http://pastebin.com/qyqeu5E4


This is what a process list pertaining to ceph looks like on the mon node after a reboot, please note that the ceph-create-keys hangs:

root@ceph0:/var/log/ceph# ps aux | grep ceph
root       988  0.2  0.2  34204  7368 ?        S    15:36   0:00 /usr/bin/python /usr/sbin/ceph-create-keys -i cehp0
root      1449  0.0  0.1  94844  3972 ?        Ss   15:38   0:00 sshd: ceph [priv]  
ceph      1470  0.0  0.0  94844  1740 ?        S    15:38   0:00 sshd: ceph@pts/0   
ceph      1471  0.3  0.1  22308  3384 pts/0    Ss   15:38   0:00 -bash
root      1670  0.0  0.0   9452   904 pts/0    R+   15:38   0:00 grep --color=auto ceph

So as you can see, no mon  process is started, I presume that this is somehow a result of the ceph-create-keys process hanging.  /var/log/ceph-mon.cehp0.log shows the following in this status of the system, after a reboot:

2014-01-09 15:49:44.433943 7f9e45eb97c0  0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 972
2014-01-09 15:49:44.535436 7f9e45eb97c0 -1 failed to create new leveldb store

If I manually start the ceph process by:

start ceph-mon id=ceph0

it starts fine, and "ceph --admin-daemon=/var/run/ceph/ceph-mon.ceph0.asok  mon_status" outputs:

{ "name": "ceph0",
  "rank": 0,
  "state": "leader",
  "election_epoch": 1,
  "quorum": [
        0],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 1,
      "fsid": "e0696edf-ac8d-4095-beaf-6a2592964060",
      "modified": "2014-01-08 02:00:23.264895",
      "created": "2014-01-08 02:00:23.264895",
      "mons": [
            { "rank": 0,
              "name": "ceph0",
              "addr": "192.168.10.200:6789\/0"}]}}


The mon process seems ok, but the ceph-create-keys keeps hanging and there is no quorum.  

If I kill the ceph-create-keys process and  run "/usr/bin/python /usr/sbin/ceph-create-keys -i cehp0" manually i get:

"admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
INFO:ceph-create-keys:ceph-mon admin socket not ready yet."

every second or so. This is what happens when I terminate the manually started ceph-create-keys process:

^CTraceback (most recent call last):
  File "/usr/sbin/ceph-create-keys", line 227, in <module>
    main()
  File "/usr/sbin/ceph-create-keys", line 213, in main
    wait_for_quorum(cluster=args.cluster, mon_id=args.id)
  File "/usr/sbin/ceph-create-keys", line 34, in wait_for_quorum
    time.sleep(1)
KeyboardInterrupt


I will finish this long post by pasting what happens if I try to restart all services on the cluster, just so you know that the mon problem is only the first problem I'm battling with here :)

http://pastebin.com/mPGhiYu5

Please note, that after the above global restart, the ceph-create-keys hanging process is back.


Best,
Moe










On 01/09/2014 09:51 AM, Travis Rhoden wrote:
On Thu, Jan 9, 2014 at 9:48 AM, Alfredo Deza <alfredo.deza@xxxxxxxxxxx> wrote:
On Thu, Jan 9, 2014 at 9:45 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
HI Mordur,

I'm definitely straining my memory on this one, but happy to help if I can?

I'm pretty sure I did not figure it out -- you can see I didn't get
any feedback from the list.  What I did do, however, was uninstall
everything and try the same setup with mkcephfs, which worked fine at
the time.  This was 8 months ago, though, and I have since used
ceph-deploy many times with great success.  I am not sure if I have
ever tried a similar set up, though, with just one node and one
monitor.  Fortuitiously, I may be trying that very setup today or
tomorrow.  If I still have issues, I will be sure to post them here.

Are you using both the latest ceph-deploy and the latest Ceph packages
(Emperor or newer dev packages)?  There have been lots of changes in
the monitor area, including in the upstart scripts, that made many
things more robust in this area.  I did have a cluster a few months
ago that had a flaky monitor that refused to join quorum after
install, and I had to just blow it away and re-install/deploy it and
then it was fine, which I thought was odd.

Sorry that's probably not much help.

 - Travis

On Thu, Jan 9, 2014 at 12:40 AM, Mordur Ingolfsson <rass@xxxxxxx> wrote:
Hi Travis,

Did you figure this out? I'm dealing with exactly the same thing over here.
Can you share what exactly you are having problems with? ceph-deploy's
log output has been
much improved and it is super useful to have that when dealing with
possible issues.
I do not, it was long long ago...  And it case it was ambiguous, let
me explicitly say I was not recommending the use of mkcephfs at all
(is that even still possible?).  ceph-deploy is certainly the tool to
use.


        
Best,
Moe
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux