osds and gateway not coming up on restart

"Mike O'Toole" <mike.otoole@xxxxxxxxxxx> · Thu, 10 Oct 2013 16:43:01 -0400

So I took a power hit today and after coming back up 3 of my osds and my radosgw are not coming back up.  The logs show no clue as to what may have happened. 
When I manually try to restart the gateway I see the following in the logs:

2013-10-10 16:04:23.166046 7f8480d9a700  2 RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:04:45.166193 7f8480d9a700  2 RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:05:07.166335 7f8480d9a700  2 RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:05:29.166501 7f8480d9a700  2 RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:05:51.166638 7f8480d9a700  2 RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:06:13.166762 7f8480d9a700  2 RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:06:35.166914 7f8480d9a700  2 RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:06:57.167055 7f8480d9a700  2 RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:07:10.196475 7f848535c700 -1 Initialization timeout, failed to initialize

and then the process dies.

As for the OSDs, there is no logging.  I try to manually start them and it reports they are already running all their are no OSD pids on that server.  

$ sudo start ceph-all
start: Job is already running: ceph-all

Any ideas where to look for more info on these two issues?  I am running ceph 0.67.3.

Cluster status :
HEALTH_WARN 78 pgs down; 78 pgs peering; 78 pgs stuck inactive; 78 pgs stuck unclean; 16 requests are blocked > 32 sec; 1 osds have slow requests

ceph osd stat
e134: 18 osds: 15 up, 15 in

Thanks, Mike

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com