I think I just figured out the issue. When I installed ceph I had already prepared the three partitions for these OSDS and had fstab entries for mounting them. When I first did this install I didn't realize that ceph would mount the file system for you. I removed the entries and everything came back up.
Thanks for your help! From: mike.otoole@xxxxxxxxxxx To: wido@xxxxxxxx; ceph-users@xxxxxxxxxxxxxx Date: Thu, 10 Oct 2013 17:27:50 -0400 Subject: Re: osds and gateway not coming up on restart Sorry, I didn't mean it logged nothing, I just saw no clues that were apparent to me. Subsequent restart attempts log nothing. Here are the last few lines 2013-10-10 15:19:49.832776 7f94567ac700 10 -- 10.10.2.202:6809/15169 reaper deleted pipe 0x1e04500 2013-10-10 15:19:49.832785 7f94567ac700 10 -- 10.10.2.202:6809/15169 reaper done 2013-10-10 15:19:49.832805 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: waiting for dispatch queue 2013-10-10 15:19:49.832829 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: dispatch queue is stopped 2013-10-10 15:19:49.832837 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopping accepter thread 2013-10-10 15:19:49.832841 7f945c5d97c0 10 accepter.stop accepter 2013-10-10 15:19:49.832864 7f944c17a700 20 accepter.accepter poll got 1 2013-10-10 15:19:49.832874 7f944c17a700 20 accepter.accepter closing 2013-10-10 15:19:49.832891 7f944c17a700 10 accepter.accepter stopping 2013-10-10 15:19:49.832956 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopped accepter thread 2013-10-10 15:19:49.832969 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopping reaper thread 2013-10-10 15:19:49.832995 7f94567ac700 10 -- 10.10.2.202:6809/15169 reaper_entry done 2013-10-10 15:19:49.833072 7f945c5d97c0 20 -- 10.10.2.202:6809/15169 wait: stopped reaper thread 2013-10-10 15:19:49.833082 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: closing pipes 2013-10-10 15:19:49.833086 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 reaper 2013-10-10 15:19:49.833090 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 reaper done 2013-10-10 15:19:49.833093 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: waiting for pipes to close 2013-10-10 15:19:49.833097 7f945c5d97c0 10 -- 10.10.2.202:6809/15169 wait: done. 2013-10-10 15:19:49.833101 7f945c5d97c0 1 -- 10.10.2.202:6809/15169 shutdown complete. > Date: Thu, 10 Oct 2013 23:04:05 +0200 > From: wido@xxxxxxxx > To: ceph-users@xxxxxxxxxxxxxx > CC: mike.otoole@xxxxxxxxxxx > Subject: Re: osds and gateway not coming up on restart > > On 10/10/2013 11:01 PM, Mike O'Toole wrote: > > > > I created them with ceph-deploy and there are no OSD entries in the > > ceph.conf. Trying to start them that way doesnt work. > > > > (bringing discussion back to the list) > > Are you sure there is no logging? Because there should be in /var/log/ceph > > Wido > > > > > > Date: Thu, 10 Oct 2013 22:57:29 +0200 > > > From: wido@xxxxxxxx > > > To: mike.otoole@xxxxxxxxxxx > > > Subject: Re: osds and gateway not coming up on restart > > > > > > On 10/10/2013 10:54 PM, Mike O'Toole wrote: > > > > I verified the OSDs were not running and I issued "sudo stop ceph-all" > > > > and "sudo start ceph-all" but nothing comes up. The OSDS are all on the > > > > same server. The file systems are xfs and I am able to mount them. > > > > > > Could you try starting them manually via: > > > > > > $ service ceph start osd.X > > > > > > where X is the OSD number of those three OSDs. > > > > > > If that doesn't work, check the logs of the OSDs why they aren't > > starting. > > > > > > I'm not so familiar with the upstart scripts from Ceph, but I think it > > > only starts the OSDs when they have been created via ceph-deploy thus > > > ceph-disk-prepare and ceph-disk-activate > > > > > > Wido > > > > > > > > > > > /dev/sdb1 931G 1.1G 930G 1% /data-1 > > > > /dev/sdb2 931G 1.1G 930G 1% /data-2 > > > > /dev/sdb3 931G 1.1G 930G 1% /data-3 > > > > > > > > Interestingly though they are empty. > > > > > > > > > Date: Thu, 10 Oct 2013 22:46:26 +0200 > > > > > From: wido@xxxxxxxx > > > > > To: ceph-users@xxxxxxxxxxxxxx > > > > > Subject: Re: osds and gateway not coming up on restart > > > > > > > > > > On 10/10/2013 10:43 PM, Mike O'Toole wrote: > > > > > > So I took a power hit today and after coming back up 3 of my osds > > > > and my > > > > > > radosgw are not coming back up. The logs show no clue as to > > what may > > > > > > have happened. > > > > > > > > > > > > When I manually try to restart the gateway I see the following in > > > > the logs: > > > > > > > > > > > > 2013-10-10 16:04:23.166046 7f8480d9a700 2 > > > > > > RGWDataChangesLog::ChangesRenewThread: start > > > > > > 2013-10-10 16:04:45.166193 7f8480d9a700 2 > > > > > > RGWDataChangesLog::ChangesRenewThread: start > > > > > > 2013-10-10 16:05:07.166335 7f8480d9a700 2 > > > > > > RGWDataChangesLog::ChangesRenewThread: start > > > > > > 2013-10-10 16:05:29.166501 7f8480d9a700 2 > > > > > > RGWDataChangesLog::ChangesRenewThread: start > > > > > > 2013-10-10 16:05:51.166638 7f8480d9a700 2 > > > > > > RGWDataChangesLog::ChangesRenewThread: start > > > > > > 2013-10-10 16:06:13.166762 7f8480d9a700 2 > > > > > > RGWDataChangesLog::ChangesRenewThread: start > > > > > > 2013-10-10 16:06:35.166914 7f8480d9a700 2 > > > > > > RGWDataChangesLog::ChangesRenewThread: start > > > > > > 2013-10-10 16:06:57.167055 7f8480d9a700 2 > > > > > > RGWDataChangesLog::ChangesRenewThread: start > > > > > > 2013-10-10 16:07:10.196475 7f848535c700 -1 Initialization timeout, > > > > > > failed to initialize > > > > > > > > > > > > and then the process dies. > > > > > > > > > > > > As for the OSDs, there is no logging. I try to manually start > > them and > > > > > > it reports they are already running all their are no OSD pids > > on that > > > > > > server. > > > > > > > > > > > > $ sudo start ceph-all > > > > > > start: Job is already running: ceph-all > > > > > > > > > > > > > > > > Can you verify the ceph-osd processes are actually there? > > > > > > > > > > > Any ideas where to look for more info on these two issues? I am > > running > > > > > > ceph 0.67.3. > > > > > > > > > > > > Cluster status : > > > > > > HEALTH_WARN 78 pgs down; 78 pgs peering; 78 pgs stuck inactive; > > 78 pgs > > > > > > stuck unclean; 16 requests are blocked > 32 sec; 1 osds have slow > > > > requests > > > > > > > > > > > > ceph osd stat > > > > > > e134: 18 osds: 15 up, 15 in > > > > > > > > > > > > > > > > I assume all the OSDs are on the same machine since you ran "start > > > > > ceph-all" on one node? > > > > > > > > > > Can you still manually mount the filesystems of those OSDs? Could > > it be > > > > > they got corrupted due to the powerfailure? btrfs? xfs? > > > > > > > > > > The RGW seems to block because 78 pgs are in peering and are > > inactive, > > > > > causing the RGW to be blocked from starting. > > > > > > > > > > Wido > > > > > > > > > > > Thanks, Mike > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > ceph-users mailing list > > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > > > > > > -- > > > > > Wido den Hollander > > > > > 42on B.V. > > > > > > > > > > Phone: +31 (0)20 700 9902 > > > > > Skype: contact42on > > > > > _______________________________________________ > > > > > ceph-users mailing list > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > -- > > > Wido den Hollander > > > 42on B.V. > > > > > > Phone: +31 (0)20 700 9902 > > > Skype: contact42on > > > -- > Wido den Hollander > 42on B.V. > > Phone: +31 (0)20 700 9902 > Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com