Re: Inability to create cluster using ceph-deploy and alternate cluster name (ceph rc script issue)

Alfredo Deza <alfredo.deza@xxxxxxxxxxx> · Fri, 18 Oct 2013 08:57:10 -0400

On Thu, Oct 17, 2013 at 6:24 PM, Charles 'Boyo <charlesboyo@xxxxxxxxx> wrote:
> Hello list.
>
> I am trying to create a new single-node cluster using the ceph-deploy
> tool but the 'mon create' step keeps failing apparently because the
> 'ceph' cluster name is hardwired into the /etc/init.d/ceph rc script
> or more correctly, the rc script does not have any support for
> "--cluster <name>". Has anyone else experienced this or am I doing
> something wrong?

Hi Charles, I think it *appears* to be that way but what happens is
that we can point to the right configuration file with the `-c` flag
to specify the configuration.

ceph-deploy does this for you when you pass the `--cluster` flag and
replaces that to get to the right path.

That is what your logs show and what I would expect too:

[ohafia][INFO  ] Running command: /sbin/service ceph -c
/etc/ceph/zfsbackup.conf start mon.ohafia

What I do believe is going on here is the error checking that
ceph-deploy is doing to make sure you are able to run your mons
correctly. This functionality appeared in 1.2.7 and the reason it is
failing is because we are not using the cluster argument (and we
should).

Your logs are tell here:

[ohafia][INFO  ] Running command: ceph --admin-daemon
/var/run/ceph/ceph-mon.ohafia.asok mon_status
[ohafia][ERROR ] admin_socket: exception getting command descriptions:
[Errno 2] No such file or directory

As you can see, `/var/run/ceph/` should actually have a
zfsbackup-mon.ohafia.asok file not a ceph-mon.ohafia.asok.

Your mon *should* be running, and the invocations to start them with
ceph-deploy are correct, however, you are not going to get the nice
reporting of what is going on and if there are any errors.

That is clearly a bug, I opened an issue
(http://tracker.ceph.com/issues/6587) to fix this.

>
> Machine is a fresh install of CentOS 6.4 x86_64 with ceph/ceph-noarch
> dumpling (http://ceph.com/rpm-dumpling/el6/$basearch) repo.
> ceph-deploy version 1.2.7
>
> Attempting to deploy a ceph cluster called zfsbackup on the local
> machine called ohafia:
>
> [root@ohafia ceph-deploy]# ceph-deploy --cluster=zfsbackup new ohafia:10.50.1.24
> [ceph_deploy.cli][INFO  ] Invoked (1.2.7): /usr/bin/ceph-deploy
> --cluster=zfsbackup new ohafia:10.50.1.24
> [ceph_deploy.new][DEBUG ] Creating new cluster named zfsbackup
> [ceph_deploy.new][DEBUG ] Resolving host 10.50.1.24
> [ceph_deploy.new][DEBUG ] Monitor ohafia at 10.50.1.24
> [ceph_deploy.new][DEBUG ] Monitor initial members are ['ohafia']
> [ceph_deploy.new][DEBUG ] Monitor addrs are ['10.50.1.24']
> [ceph_deploy.new][DEBUG ] Creating a random mon key...
> [ceph_deploy.new][DEBUG ] Writing initial config to zfsbackup.conf...
> [ceph_deploy.new][DEBUG ] Writing monitor keyring to zfsbackup.mon.keyring...
>
> [root@ohafia ceph-deploy]# ceph-deploy --cluster=zfsbackup install ohafia
> [ceph_deploy.cli][INFO  ] Invoked (1.2.7): /usr/bin/ceph-deploy
> --cluster=zfsbackup install ohafia
> [ceph_deploy.install][DEBUG ] Installing stable version dumpling on
> cluster zfsbackup hosts ohafia
> [ceph_deploy.install][DEBUG ] Detecting platform for host ohafia ...
> [ceph_deploy.sudo_pushy][DEBUG ] will use a local connection without sudo
> [ceph_deploy.install][INFO  ] Distro info: CentOS 6.4 Final
> [ohafia][INFO  ] installing ceph on ohafia
> <snip - epel and ceph repos installed okay>
> [ohafia][INFO  ] Running command: yum -y -q install ceph
> [ohafia][INFO  ] Package ceph-0.67.4-0.el6.x86_64 already installed
> and latest version
> [ohafia][INFO  ] Running command: ceph --version
> [ohafia][INFO  ] ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
>
> [root@ohafia ceph-deploy]# ceph-deploy --cluster=zfsbackup mon create
> [ceph_deploy.cli][INFO  ] Invoked (1.2.7): /usr/bin/ceph-deploy
> --cluster=zfsbackup mon create
> [ceph_deploy.mon][DEBUG ] Deploying mon, cluster zfsbackup hosts ohafia
> [ceph_deploy.mon][DEBUG ] detecting platform for host ohafia ...
> [ceph_deploy.sudo_pushy][DEBUG ] will use a local connection without sudo
> [ceph_deploy.mon][INFO  ] distro info: CentOS 6.4 Final
> [ohafia][DEBUG ] determining if provided host has same hostname in remote
> [ohafia][DEBUG ] deploying mon to ohafia
> [ohafia][DEBUG ] remote hostname: ohafia
> [ohafia][INFO  ] write cluster configuration to /etc/ceph/{cluster}.conf
> [ohafia][INFO  ] creating path: /var/lib/ceph/mon/zfsbackup-ohafia
> [ohafia][DEBUG ] checking for done path: /var/lib/ceph/mon/zfsbackup-ohafia/done
> [ohafia][DEBUG ] done path does not exist:
> /var/lib/ceph/mon/zfsbackup-ohafia/done
> [ohafia][INFO  ] creating keyring file:
> /var/lib/ceph/tmp/zfsbackup-ohafia.mon.keyring
> [ohafia][INFO  ] create the monitor keyring file
> [ohafia][INFO  ] Running command: ceph-mon --cluster zfsbackup --mkfs
> -i ohafia --keyring /var/lib/ceph/tmp/zfsbackup-ohafia.mon.keyring
> [ohafia][INFO  ] ceph-mon: mon.noname-a 10.50.1.24:6789/0 is local,
> renaming to mon.ohafia
> [ohafia][INFO  ] ceph-mon: set fsid to eb552fab-98f9-4c8a-baef-b061d9163262
> [ohafia][INFO  ] ceph-mon: created monfs at
> /var/lib/ceph/mon/zfsbackup-ohafia for mon.ohafia
> [ohafia][INFO  ] unlinking keyring file
> /var/lib/ceph/tmp/zfsbackup-ohafia.mon.keyring
> [ohafia][INFO  ] create a done file to avoid re-doing the mon deployment
> [ohafia][INFO  ] create the init path if it does not exist
> [ohafia][INFO  ] locating `service` executable...
> [ohafia][INFO  ] found `service` executable: /sbin/service
> [ohafia][INFO  ] Running command: /sbin/service ceph -c
> /etc/ceph/zfsbackup.conf start mon.ohafia
> [ohafia][INFO  ] Running command: ceph --admin-daemon
> /var/run/ceph/ceph-mon.ohafia.asok mon_status
> [ohafia][ERROR ] admin_socket: exception getting command descriptions:
> [Errno 2] No such file or directory
> [ohafia][WARNIN] monitor: mon.ohafia, might not be running yet
> [ohafia][INFO  ] Running command: ceph --admin-daemon
> /var/run/ceph/ceph-mon.ohafia.asok mon_status
> [ohafia][ERROR ] admin_socket: exception getting command descriptions:
> [Errno 2] No such file or directory
> [ohafia][WARNIN] monitor ohafia does not exist in monmap
> [ohafia][WARNIN] neither `public_addr` nor `public_network` keys are
> defined for monitors
> [ohafia][WARNIN] monitors may not be able to form quorum
>
> Note that the service invocation picks up the config-file but
> apparently that is not sufficient to locate the mon correctly.
> The rc script contains references to "/var/lib/ceph/mon/ceph-$id" and
> "/var/lib/ceph/osd/ceph-$id". Changing those to read 'zfsbackup-$id'
> did not help.
>
> I am able to start the mon directly by executing:
>
> [root@ohafia ceph-deploy]# ceph-mon -c /etc/ceph/zfsbackup.conf
> --cluster zfsbackup -i ohafia -d
> 2013-10-17 23:06:08.222103 7f0a61ffc7a0  0 ceph version 0.67.4
> (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-mon, pid
> 14615
> starting mon.ohafia rank 0 at 10.50.1.24:6789/0 mon_data
> /var/lib/ceph/mon/zfsbackup-ohafia fsid
> eb552fab-98f9-4c8a-baef-b061d9163262
> 2013-10-17 23:06:08.273868 7f0a61ffc7a0  1 mon.ohafia@-1(probing) e1
> preinit fsid eb552fab-98f9-4c8a-baef-b061d9163262
> 2013-10-17 23:06:08.274196 7f0a61ffc7a0  1
> mon.ohafia@-1(probing).paxosservice(pgmap 1..2) refresh upgraded,
> format 0 -> 1
> 2013-10-17 23:06:08.274208 7f0a61ffc7a0  1 mon.ohafia@-1(probing).pg
> v0 on_upgrade discarding in-core PGMap
> 2013-10-17 23:06:08.275964 7f0a61ffc7a0  1
> mon.ohafia@-1(probing).paxosservice(auth 1..3) refresh upgraded,
> format 0 -> 1
> 2013-10-17 23:06:08.301696 7f0a61ffc7a0  0 mon.ohafia@-1(probing) e1
> my rank is now 0 (was -1)
> 2013-10-17 23:06:08.301720 7f0a61ffc7a0  1 mon.ohafia@0(probing) e1
> win_standalone_election
> 2013-10-17 23:06:08.321949 7f0a61ffc7a0  0 log [INF] : mon.ohafia@0
> won leader election with quorum 0
> 2013-10-17 23:06:08.322358 7f0a61ffc7a0  0 log [INF] : pgmap v2: 192
> pgs: 192 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
> 2013-10-17 23:06:08.322447 7f0a61ffc7a0  0 log [INF] : mdsmap e1: 0/0/1 up
> 2013-10-17 23:06:08.322530 7f0a61ffc7a0  0 log [INF] : osdmap e1: 0
> osds: 0 up, 0 in
> 2013-10-17 23:06:08.322650 7f0a61ffc7a0  0 log [INF] : monmap e1: 1
> mons at {ohafia=10.50.1.24:6789/0}
> 2013-10-17 23:06:08.322805 7f0a5e106700  1
> mon.ohafia@0(leader).paxos(paxos active c 1..13) is_readable
> now=2013-10-17 23:06:08.322806 lease_expire=0.000000 has v0 lc 13
> <snip>
> ^C2013-10-17 23:07:49.709603 7f0a5cd04700 -1 mon.ohafia@0(leader) e1
> *** Got Signal Interrupt ***
> 2013-10-17 23:07:49.709638 7f0a5cd04700  1 mon.ohafia@0(leader) e1 shutdown
> 2013-10-17 23:07:49.709716 7f0a5cd04700  0 quorum service shutdown
> 2013-10-17 23:07:49.709720 7f0a5cd04700  0
> mon.ohafia@0(shutdown).health(1) HealthMonitor::service_shutdown 1
> services
> 2013-10-17 23:07:49.709723 7f0a5cd04700  0 quorum service shutdown
>
> In addition, the following files are from my /var/log/ceph:
> [root@ohafia ceph-deploy]# ls -l /var/log/ceph/
> total 4
> -rw-r--r--. 1 root root   0 Oct 17 23:08 ceph-client.admin.log
> -rw-r--r--. 1 root root   0 Oct 17 23:08 ceph-mon.ohafia.log
> -rw-------. 1 root root 834 Oct 17 23:06 zfsbackup.log
>
> The first two were created by the attempt to start the monitor with
> the service executable (I removed them and tried that manually and
> they returned), the last one is from my direct execution of ceph-mon.
>
> Does this mean there is no expectation to (manage or deploy)
> alternatively named clusters with the ceph rc script?
>
> Charles
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com