Re: osds and gateway not coming up on restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Sorry, I didn't mean it logged nothing,  I just saw no clues that were apparent to me.    Subsequent restart attempts log nothing.   Here are the last few lines 

2013-10-10 15:19:49.832776 7f94567ac700 10 -- 10.10.2.202:6809/15169 reaper deleted pipe 0x1e04500
2013-10-10 15:19:49.832785 7f94567ac700 10 -- 10.10.2.202:6809/15169 reaper done
2013-10-10 15:19:49.832805 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: waiting for dispatch queue
2013-10-10 15:19:49.832829 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: dispatch queue is stopped
2013-10-10 15:19:49.832837 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopping accepter thread
2013-10-10 15:19:49.832841 7f945c5d97c0 10 accepter.stop accepter
2013-10-10 15:19:49.832864 7f944c17a700 20 accepter.accepter poll got 1
2013-10-10 15:19:49.832874 7f944c17a700 20 accepter.accepter closing
2013-10-10 15:19:49.832891 7f944c17a700 10 accepter.accepter stopping
2013-10-10 15:19:49.832956 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopped accepter thread
2013-10-10 15:19:49.832969 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopping reaper thread
2013-10-10 15:19:49.832995 7f94567ac700 10 -- 10.10.2.202:6809/15169 reaper_entry done
2013-10-10 15:19:49.833072 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopped reaper thread
2013-10-10 15:19:49.833082 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: closing pipes
2013-10-10 15:19:49.833086 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 reaper
2013-10-10 15:19:49.833090 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 reaper done
2013-10-10 15:19:49.833093 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: waiting for pipes  to close
2013-10-10 15:19:49.833097 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: done.
2013-10-10 15:19:49.833101 7f945c5d97c0  1 -- 10.10.2.202:6809/15169 shutdown complete.




> Date: Thu, 10 Oct 2013 23:04:05 +0200
> From: wido@xxxxxxxx
> To: ceph-users@xxxxxxxxxxxxxx
> CC: mike.otoole@xxxxxxxxxxx
> Subject: Re: osds and gateway not coming up on restart
>
> On 10/10/2013 11:01 PM, Mike O'Toole wrote:
> >
> > I created them with ceph-deploy and there are no OSD entries in the
> > ceph.conf. Trying to start them that way doesnt work.
> >
>
> (bringing discussion back to the list)
>
> Are you sure there is no logging? Because there should be in /var/log/ceph
>
> Wido
>
> >
> > > Date: Thu, 10 Oct 2013 22:57:29 +0200
> > > From: wido@xxxxxxxx
> > > To: mike.otoole@xxxxxxxxxxx
> > > Subject: Re: osds and gateway not coming up on restart
> > >
> > > On 10/10/2013 10:54 PM, Mike O'Toole wrote:
> > > > I verified the OSDs were not running and I issued "sudo stop ceph-all"
> > > > and "sudo start ceph-all" but nothing comes up. The OSDS are all on the
> > > > same server. The file systems are xfs and I am able to mount them.
> > >
> > > Could you try starting them manually via:
> > >
> > > $ service ceph start osd.X
> > >
> > > where X is the OSD number of those three OSDs.
> > >
> > > If that doesn't work, check the logs of the OSDs why they aren't
> > starting.
> > >
> > > I'm not so familiar with the upstart scripts from Ceph, but I think it
> > > only starts the OSDs when they have been created via ceph-deploy thus
> > > ceph-disk-prepare and ceph-disk-activate
> > >
> > > Wido
> > >
> > > >
> > > > /dev/sdb1 931G 1.1G 930G 1% /data-1
> > > > /dev/sdb2 931G 1.1G 930G 1% /data-2
> > > > /dev/sdb3 931G 1.1G 930G 1% /data-3
> > > >
> > > > Interestingly though they are empty.
> > > >
> > > > > Date: Thu, 10 Oct 2013 22:46:26 +0200
> > > > > From: wido@xxxxxxxx
> > > > > To: ceph-users@xxxxxxxxxxxxxx
> > > > > Subject: Re: osds and gateway not coming up on restart
> > > > >
> > > > > On 10/10/2013 10:43 PM, Mike O'Toole wrote:
> > > > > > So I took a power hit today and after coming back up 3 of my osds
> > > > and my
> > > > > > radosgw are not coming back up. The logs show no clue as to
> > what may
> > > > > > have happened.
> > > > > >
> > > > > > When I manually try to restart the gateway I see the following in
> > > > the logs:
> > > > > >
> > > > > > 2013-10-10 16:04:23.166046 7f8480d9a700 2
> > > > > > RGWDataChangesLog::ChangesRenewThread: start
> > > > > > 2013-10-10 16:04:45.166193 7f8480d9a700 2
> > > > > > RGWDataChangesLog::ChangesRenewThread: start
> > > > > > 2013-10-10 16:05:07.166335 7f8480d9a700 2
> > > > > > RGWDataChangesLog::ChangesRenewThread: start
> > > > > > 2013-10-10 16:05:29.166501 7f8480d9a700 2
> > > > > > RGWDataChangesLog::ChangesRenewThread: start
> > > > > > 2013-10-10 16:05:51.166638 7f8480d9a700 2
> > > > > > RGWDataChangesLog::ChangesRenewThread: start
> > > > > > 2013-10-10 16:06:13.166762 7f8480d9a700 2
> > > > > > RGWDataChangesLog::ChangesRenewThread: start
> > > > > > 2013-10-10 16:06:35.166914 7f8480d9a700 2
> > > > > > RGWDataChangesLog::ChangesRenewThread: start
> > > > > > 2013-10-10 16:06:57.167055 7f8480d9a700 2
> > > > > > RGWDataChangesLog::ChangesRenewThread: start
> > > > > > 2013-10-10 16:07:10.196475 7f848535c700 -1 Initialization timeout,
> > > > > > failed to initialize
> > > > > >
> > > > > > and then the process dies.
> > > > > >
> > > > > > As for the OSDs, there is no logging. I try to manually start
> > them and
> > > > > > it reports they are already running all their are no OSD pids
> > on that
> > > > > > server.
> > > > > >
> > > > > > $ sudo start ceph-all
> > > > > > start: Job is already running: ceph-all
> > > > > >
> > > > >
> > > > > Can you verify the ceph-osd processes are actually there?
> > > > >
> > > > > > Any ideas where to look for more info on these two issues? I am
> > running
> > > > > > ceph 0.67.3.
> > > > > >
> > > > > > Cluster status :
> > > > > > HEALTH_WARN 78 pgs down; 78 pgs peering; 78 pgs stuck inactive;
> > 78 pgs
> > > > > > stuck unclean; 16 requests are blocked > 32 sec; 1 osds have slow
> > > > requests
> > > > > >
> > > > > > ceph osd stat
> > > > > > e134: 18 osds: 15 up, 15 in
> > > > > >
> > > > >
> > > > > I assume all the OSDs are on the same machine since you ran "start
> > > > > ceph-all" on one node?
> > > > >
> > > > > Can you still manually mount the filesystems of those OSDs? Could
> > it be
> > > > > they got corrupted due to the powerfailure? btrfs? xfs?
> > > > >
> > > > > The RGW seems to block because 78 pgs are in peering and are
> > inactive,
> > > > > causing the RGW to be blocked from starting.
> > > > >
> > > > > Wido
> > > > >
> > > > > > Thanks, Mike
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list
> > > > > > ceph-users@xxxxxxxxxxxxxx
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Wido den Hollander
> > > > > 42on B.V.
> > > > >
> > > > > Phone: +31 (0)20 700 9902
> > > > > Skype: contact42on
> > > > > _______________________________________________
> > > > > ceph-users mailing list
> > > > > ceph-users@xxxxxxxxxxxxxx
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > > --
> > > Wido den Hollander
> > > 42on B.V.
> > >
> > > Phone: +31 (0)20 700 9902
> > > Skype: contact42on
>
>
> --
> Wido den Hollander
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux