Re: Defective ceph startup script

Greg Chavez <greg.chavez@xxxxxxxxx> · Wed, 31 Jul 2013 16:42:16 -0400

After I did what Eric Eastman, suggested, my mon and osd sockets showed up in /var/run/ceph:
root@kvm-cs-sn-10i:/etc/ceph# ls /var/run/ceph/
ceph-osd.0.asok  ceph-osd.1.asok  ceph-osd.2.asok  ceph-osd.3.asok  ceph-osd.4.asok  ceph-osd.5.asok  ceph-osd.6.asok  ceph-osd.7.asok

However, while the osd daemons came back on line, the mon did not.  As it happened, the cause for it is in another thread from today (Subject: Problem with MON after reboot).  The solution is to upgrade and restart the other mon nodes.  This worked.

Now the status/stop/start  commands work each and every time.  Somewhere along the line this got goofed up and the osd and mon sockets either weren't created or were deleted.  I started my cluster with a devel version of cuttlefish, so who knows?

Craig, that's good advice re: starting the mon daemons first, but this is no good if the sockets are missing from /var/run/ceph.  I'll keep on eye on these directories moving forward to make sure they don't get lost again.

Thanks everyone for their help. Now I hope to engage in some drama free upgrading on my osd-only nodes.  Ceph is great!

On Wed, Jul 31, 2013 at 4:31 PM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> wrote:

You do need to use the stop script, not service stop. If you use service stop, Upstart will restart the service.  It's ok for start and restart, because that what you want anyway, but service stop is effectively a restart.

I wouldn't recommend doing stop ceph-all and start ceph-all after an upgrade anyway, at least not with the latest 0.61 upgrades.  Due to the MON issues between 61.4, 61.5, and 61.6, it seemed safer to follow the major version upgrade procedure (http://ceph.com/docs/next/install/upgrading-ceph/).  So I've been restarting MON on all nodes, then all OSDs on all nodes, then the remaining services.

That said, it stop ceph-all should stop all the daemons.  I just wouldn't use this upgrade procedure.

On all my cluster nodes to upgrade from 0.61.5 to 0.61.7 and then noticed

that some of my systems did not restart all the daemons.  I tried:

stop ceph-all

start ceph-all

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
\*..+.-
--Greg Chavez
+//..;};

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com