Re: osds and gateway not coming up on restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/10/2013 10:43 PM, Mike O'Toole wrote:
So I took a power hit today and after coming back up 3 of my osds and my
radosgw are not coming back up.  The logs show no clue as to what may
have happened.

When I manually try to restart the gateway I see the following in the logs:

2013-10-10 16:04:23.166046 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:04:45.166193 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:05:07.166335 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:05:29.166501 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:05:51.166638 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:06:13.166762 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:06:35.166914 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:06:57.167055 7f8480d9a700  2
RGWDataChangesLog::ChangesRenewThread: start
2013-10-10 16:07:10.196475 7f848535c700 -1 Initialization timeout,
failed to initialize

and then the process dies.

As for the OSDs, there is no logging.  I try to manually start them and
it reports they are already running all their are no OSD pids on that
server.

$ sudo start ceph-all
start: Job is already running: ceph-all


Can you verify the ceph-osd processes are actually there?

Any ideas where to look for more info on these two issues?  I am running
ceph 0.67.3.

Cluster status :
HEALTH_WARN 78 pgs down; 78 pgs peering; 78 pgs stuck inactive; 78 pgs
stuck unclean; 16 requests are blocked > 32 sec; 1 osds have slow requests

ceph osd stat
e134: 18 osds: 15 up, 15 in


I assume all the OSDs are on the same machine since you ran "start ceph-all" on one node?

Can you still manually mount the filesystems of those OSDs? Could it be they got corrupted due to the powerfailure? btrfs? xfs?

The RGW seems to block because 78 pgs are in peering and are inactive, causing the RGW to be blocked from starting.

Wido

Thanks, Mike


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux