Re: osds and gateway not coming up on restart

Wido den Hollander <wido@xxxxxxxx> · Thu, 10 Oct 2013 23:04:05 +0200

On 10/10/2013 11:01 PM, Mike O'Toole wrote:

I created them with ceph-deploy and there are no OSD entries in the
ceph.conf.  Trying to start them that way doesnt work.

(bringing discussion back to the list)

Are you sure there is no logging? Because there should be in /var/log/ceph

Wido

 > Date: Thu, 10 Oct 2013 22:57:29 +0200
 > From: wido@xxxxxxxx
 > To: mike.otoole@xxxxxxxxxxx
 > Subject: Re:  osds and gateway not coming up on restart
 >
 > On 10/10/2013 10:54 PM, Mike O'Toole wrote:
 > > I verified the OSDs were not running and I issued "sudo stop ceph-all"
 > > and "sudo start ceph-all" but nothing comes up. The OSDS are all on the
 > > same server. The file systems are xfs and I am able to mount them.
 >
 > Could you try starting them manually via:
 >
 > $ service ceph start osd.X
 >
 > where X is the OSD number of those three OSDs.
 >
 > If that doesn't work, check the logs of the OSDs why they aren't
starting.
 >
 > I'm not so familiar with the upstart scripts from Ceph, but I think it
 > only starts the OSDs when they have been created via ceph-deploy thus
 > ceph-disk-prepare and ceph-disk-activate
 >
 > Wido
 >
 > >
 > > /dev/sdb1 931G 1.1G 930G 1% /data-1
 > > /dev/sdb2 931G 1.1G 930G 1% /data-2
 > > /dev/sdb3 931G 1.1G 930G 1% /data-3
 > >
 > > Interestingly though they are empty.
 > >
 > > > Date: Thu, 10 Oct 2013 22:46:26 +0200
 > > > From: wido@xxxxxxxx
 > > > To: ceph-users@xxxxxxxxxxxxxx
 > > > Subject: Re:  osds and gateway not coming up on restart
 > > >
 > > > On 10/10/2013 10:43 PM, Mike O'Toole wrote:
 > > > > So I took a power hit today and after coming back up 3 of my osds
 > > and my
 > > > > radosgw are not coming back up. The logs show no clue as to
what may
 > > > > have happened.
 > > > >
 > > > > When I manually try to restart the gateway I see the following in
 > > the logs:
 > > > >
 > > > > 2013-10-10 16:04:23.166046 7f8480d9a700 2
 > > > > RGWDataChangesLog::ChangesRenewThread: start
 > > > > 2013-10-10 16:04:45.166193 7f8480d9a700 2
 > > > > RGWDataChangesLog::ChangesRenewThread: start
 > > > > 2013-10-10 16:05:07.166335 7f8480d9a700 2
 > > > > RGWDataChangesLog::ChangesRenewThread: start
 > > > > 2013-10-10 16:05:29.166501 7f8480d9a700 2
 > > > > RGWDataChangesLog::ChangesRenewThread: start
 > > > > 2013-10-10 16:05:51.166638 7f8480d9a700 2
 > > > > RGWDataChangesLog::ChangesRenewThread: start
 > > > > 2013-10-10 16:06:13.166762 7f8480d9a700 2
 > > > > RGWDataChangesLog::ChangesRenewThread: start
 > > > > 2013-10-10 16:06:35.166914 7f8480d9a700 2
 > > > > RGWDataChangesLog::ChangesRenewThread: start
 > > > > 2013-10-10 16:06:57.167055 7f8480d9a700 2
 > > > > RGWDataChangesLog::ChangesRenewThread: start
 > > > > 2013-10-10 16:07:10.196475 7f848535c700 -1 Initialization timeout,
 > > > > failed to initialize
 > > > >
 > > > > and then the process dies.
 > > > >
 > > > > As for the OSDs, there is no logging. I try to manually start
them and
 > > > > it reports they are already running all their are no OSD pids
on that
 > > > > server.
 > > > >
 > > > > $ sudo start ceph-all
 > > > > start: Job is already running: ceph-all
 > > > >
 > > >
 > > > Can you verify the ceph-osd processes are actually there?
 > > >
 > > > > Any ideas where to look for more info on these two issues? I am
running
 > > > > ceph 0.67.3.
 > > > >
 > > > > Cluster status :
 > > > > HEALTH_WARN 78 pgs down; 78 pgs peering; 78 pgs stuck inactive;
78 pgs
 > > > > stuck unclean; 16 requests are blocked > 32 sec; 1 osds have slow
 > > requests
 > > > >
 > > > > ceph osd stat
 > > > > e134: 18 osds: 15 up, 15 in
 > > > >
 > > >
 > > > I assume all the OSDs are on the same machine since you ran "start
 > > > ceph-all" on one node?
 > > >
 > > > Can you still manually mount the filesystems of those OSDs? Could
it be
 > > > they got corrupted due to the powerfailure? btrfs? xfs?
 > > >
 > > > The RGW seems to block because 78 pgs are in peering and are
inactive,
 > > > causing the RGW to be blocked from starting.
 > > >
 > > > Wido
 > > >
 > > > > Thanks, Mike
 > > > >
 > > > >
 > > > > _______________________________________________
 > > > > ceph-users mailing list
 > > > > ceph-users@xxxxxxxxxxxxxx
 > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 > > > >
 > > >
 > > >
 > > > --
 > > > Wido den Hollander
 > > > 42on B.V.
 > > >
 > > > Phone: +31 (0)20 700 9902
 > > > Skype: contact42on
 > > > _______________________________________________
 > > > ceph-users mailing list
 > > > ceph-users@xxxxxxxxxxxxxx
 > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 >
 >
 > --
 > Wido den Hollander
 > 42on B.V.
 >
 > Phone: +31 (0)20 700 9902
 > Skype: contact42on

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com