Re: osds and gateway not coming up on restart

Wido den Hollander <wido@xxxxxxxx> · Thu, 10 Oct 2013 22:46:26 +0200

On 10/10/2013 10:43 PM, Mike O'Toole wrote:
So I took a power hit today and after coming back up 3 of my osds and my
radosgw are not coming back up.  The logs show no clue as to what may
have happened.

When I manually try to restart the gateway I see the following in the logs:

2013-10-10 16:04:23.166046 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:04:45.166193 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:05:07.166335 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:05:29.166501 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:05:51.166638 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:06:13.166762 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:06:35.166914 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:06:57.167055 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:07:10.196475 7f848535c700 -1 Initialization timeout,
failed to initialize

and then the process dies.

As for the OSDs, there is no logging.  I try to manually start them and
it reports they are already running all their are no OSD pids on that
server.

$ sudo start ceph-all
start: Job is already running: ceph-all

Can you verify the ceph-osd processes are actually there?

Any ideas where to look for more info on these two issues?  I am running
ceph 0.67.3.

Cluster status :
HEALTH_WARN 78 pgs down; 78 pgs peering; 78 pgs stuck inactive; 78 pgs
stuck unclean; 16 requests are blocked > 32 sec; 1 osds have slow requests

ceph osd stat
e134: 18 osds: 15 up, 15 in

I assume all the OSDs are on the same machine since you ran "start 
ceph-all" on one node?

Can you still manually mount the filesystems of those OSDs? Could it be 
they got corrupted due to the powerfailure? btrfs? xfs?

The RGW seems to block because 78 pgs are in peering and are inactive, 
causing the RGW to be blocked from starting.

Wido

Thanks, Mike

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com