Re: osds and gateway not coming up on restart

"Mike O'Toole" <mike.otoole@xxxxxxxxxxx> · Thu, 10 Oct 2013 17:27:50 -0400

Sorry, I didn't mean it logged nothing,  I just saw no clues that were apparent to me.    Subsequent restart attempts log nothing.   Here are the last few lines 

2013-10-10 15:19:49.832776 7f94567ac700 10 -- 10.10.2.202:6809/15169 reaper deleted pipe 0x1e04500
2013-10-10 15:19:49.832785 7f94567ac700 10 -- 10.10.2.202:6809/15169 reaper done
2013-10-10 15:19:49.832805 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: waiting for dispatch queue
2013-10-10 15:19:49.832829 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: dispatch queue is stopped
2013-10-10 15:19:49.832837 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopping accepter thread
2013-10-10 15:19:49.832841 7f945c5d97c0 10 accepter.stop accepter
2013-10-10 15:19:49.832864 7f944c17a700 20 accepter.accepter poll got 1
2013-10-10 15:19:49.832874 7f944c17a700 20 accepter.accepter closing
2013-10-10 15:19:49.832891 7f944c17a700 10 accepter.accepter stopping
2013-10-10 15:19:49.832956 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopped accepter thread
2013-10-10 15:19:49.832969 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopping reaper thread
2013-10-10 15:19:49.832995 7f94567ac700 10 -- 10.10.2.202:6809/15169 reaper_entry done
2013-10-10 15:19:49.833072 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopped reaper thread
2013-10-10 15:19:49.833082 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: closing pipes
2013-10-10 15:19:49.833086 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 reaper
2013-10-10 15:19:49.833090 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 reaper done
2013-10-10 15:19:49.833093 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: waiting for pipes  to close
2013-10-10 15:19:49.833097 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: done.
2013-10-10 15:19:49.833101 7f945c5d97c0  1 -- 10.10.2.202:6809/15169 shutdown complete.

> Date: Thu, 10 Oct 2013 23:04:05 +0200
> From: wido@xxxxxxxx
> To: ceph-users@xxxxxxxxxxxxxx
> CC: mike.otoole@xxxxxxxxxxx
> Subject: Re:  osds and gateway not coming up on restart
> 
> On 10/10/2013 11:01 PM, Mike O'Toole wrote:
> >
> > I created them with ceph-deploy and there are no OSD entries in the
> > ceph.conf.  Trying to start them that way doesnt work.
> >
> 
> (bringing discussion back to the list)
> 
> Are you sure there is no logging? Because there should be in /var/log/ceph
> 
> Wido
> 
> >
> >  > Date: Thu, 10 Oct 2013 22:57:29 +0200
> >  > From: wido@xxxxxxxx
> >  > To: mike.otoole@xxxxxxxxxxx
> >  > Subject: Re:  osds and gateway not coming up on restart
> >  >
> >  > On 10/10/2013 10:54 PM, Mike O'Toole wrote:
> >  > > I verified the OSDs were not running and I issued "sudo stop ceph-all"
> >  > > and "sudo start ceph-all" but nothing comes up. The OSDS are all on the
> >  > > same server. The file systems are xfs and I am able to mount them.
> >  >
> >  > Could you try starting them manually via:
> >  >
> >  > $ service ceph start osd.X
> >  >
> >  > where X is the OSD number of those three OSDs.
> >  >
> >  > If that doesn't work, check the logs of the OSDs why they aren't
> > starting.
> >  >
> >  > I'm not so familiar with the upstart scripts from Ceph, but I think it
> >  > only starts the OSDs when they have been created via ceph-deploy thus
> >  > ceph-disk-prepare and ceph-disk-activate
> >  >
> >  > Wido
> >  >
> >  > >
> >  > > /dev/sdb1 931G 1.1G 930G 1% /data-1
> >  > > /dev/sdb2 931G 1.1G 930G 1% /data-2
> >  > > /dev/sdb3 931G 1.1G 930G 1% /data-3
> >  > >
> >  > > Interestingly though they are empty.
> >  > >
> >  > > > Date: Thu, 10 Oct 2013 22:46:26 +0200
> >  > > > From: wido@xxxxxxxx
> >  > > > To: ceph-users@xxxxxxxxxxxxxx
> >  > > > Subject: Re:  osds and gateway not coming up on restart
> >  > > >
> >  > > > On 10/10/2013 10:43 PM, Mike O'Toole wrote:
> >  > > > > So I took a power hit today and after coming back up 3 of my osds
> >  > > and my
> >  > > > > radosgw are not coming back up. The logs show no clue as to
> > what may
> >  > > > > have happened.
> >  > > > >
> >  > > > > When I manually try to restart the gateway I see the following in
> >  > > the logs:
> >  > > > >
> >  > > > > 2013-10-10 16:04:23.166046 7f8480d9a700 2
> >  > > > > RGWDataChangesLog::ChangesRenewThread: start
> >  > > > > 2013-10-10 16:04:45.166193 7f8480d9a700 2
> >  > > > > RGWDataChangesLog::ChangesRenewThread: start
> >  > > > > 2013-10-10 16:05:07.166335 7f8480d9a700 2
> >  > > > > RGWDataChangesLog::ChangesRenewThread: start
> >  > > > > 2013-10-10 16:05:29.166501 7f8480d9a700 2
> >  > > > > RGWDataChangesLog::ChangesRenewThread: start
> >  > > > > 2013-10-10 16:05:51.166638 7f8480d9a700 2
> >  > > > > RGWDataChangesLog::ChangesRenewThread: start
> >  > > > > 2013-10-10 16:06:13.166762 7f8480d9a700 2
> >  > > > > RGWDataChangesLog::ChangesRenewThread: start
> >  > > > > 2013-10-10 16:06:35.166914 7f8480d9a700 2
> >  > > > > RGWDataChangesLog::ChangesRenewThread: start
> >  > > > > 2013-10-10 16:06:57.167055 7f8480d9a700 2
> >  > > > > RGWDataChangesLog::ChangesRenewThread: start
> >  > > > > 2013-10-10 16:07:10.196475 7f848535c700 -1 Initialization timeout,
> >  > > > > failed to initialize
> >  > > > >
> >  > > > > and then the process dies.
> >  > > > >
> >  > > > > As for the OSDs, there is no logging. I try to manually start
> > them and
> >  > > > > it reports they are already running all their are no OSD pids
> > on that
> >  > > > > server.
> >  > > > >
> >  > > > > $ sudo start ceph-all
> >  > > > > start: Job is already running: ceph-all
> >  > > > >
> >  > > >
> >  > > > Can you verify the ceph-osd processes are actually there?
> >  > > >
> >  > > > > Any ideas where to look for more info on these two issues? I am
> > running
> >  > > > > ceph 0.67.3.
> >  > > > >
> >  > > > > Cluster status :
> >  > > > > HEALTH_WARN 78 pgs down; 78 pgs peering; 78 pgs stuck inactive;
> > 78 pgs
> >  > > > > stuck unclean; 16 requests are blocked > 32 sec; 1 osds have slow
> >  > > requests
> >  > > > >
> >  > > > > ceph osd stat
> >  > > > > e134: 18 osds: 15 up, 15 in
> >  > > > >
> >  > > >
> >  > > > I assume all the OSDs are on the same machine since you ran "start
> >  > > > ceph-all" on one node?
> >  > > >
> >  > > > Can you still manually mount the filesystems of those OSDs? Could
> > it be
> >  > > > they got corrupted due to the powerfailure? btrfs? xfs?
> >  > > >
> >  > > > The RGW seems to block because 78 pgs are in peering and are
> > inactive,
> >  > > > causing the RGW to be blocked from starting.
> >  > > >
> >  > > > Wido
> >  > > >
> >  > > > > Thanks, Mike
> >  > > > >
> >  > > > >
> >  > > > > _______________________________________________
> >  > > > > ceph-users mailing list
> >  > > > > ceph-users@xxxxxxxxxxxxxx
> >  > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >  > > > >
> >  > > >
> >  > > >
> >  > > > --
> >  > > > Wido den Hollander
> >  > > > 42on B.V.
> >  > > >
> >  > > > Phone: +31 (0)20 700 9902
> >  > > > Skype: contact42on
> >  > > > _______________________________________________
> >  > > > ceph-users mailing list
> >  > > > ceph-users@xxxxxxxxxxxxxx
> >  > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >  >
> >  >
> >  > --
> >  > Wido den Hollander
> >  > 42on B.V.
> >  >
> >  > Phone: +31 (0)20 700 9902
> >  > Skype: contact42on
> 
> 
> -- 
> Wido den Hollander
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com