Yep, that looks like http://tracker.ceph.com/issues/7093, which is fixed in dumpling and most of the dev releases since emperor. ;) I also cherry-picked the fix to the emperor branch and it will be included whenever we do another point release of that. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Mar 25, 2014 at 6:39 PM, Quenten Grasso <qgrasso@xxxxxxxxxx> wrote: > Hi Greg, > > Restarting the actual service ie: service ceph restart osd.50, only takes a few seconds. > > Attached is a ceph -w of just running a service ceph restart osd.50, > > You can see it marks itself down pretty much straight away. Takes a little while to mark itself as up and finish "recovery" > > If I do this to all 12 osd's the node goes crazy, It's almost like the node is cpu bound but it has 6 cores, and load average goes to 300+ > > http://pastie.org/pastes/8968950/text?key=0e0bs1ojbm2arnexn52iwq > > Regards, > Quenten > > -----Original Message----- > From: Gregory Farnum [mailto:greg@xxxxxxxxxxx] > Sent: Wednesday, 26 March 2014 2:02 AM > To: Quenten Grasso > Cc: Kyle Bader; ceph-users@xxxxxxxxxxxxxx > Subject: Re: OSD Restarts cause excessively high load average and "requests are blocked > 32 sec" > > How long does it take for the OSDs to restart? Are you just issuing a restart command via upstart/sysvinit/whatever? How many OSDMaps are generated from the time you issue that command to the time the cluster is healthy again? > > This sounds like an issue we had for a while where OSDs would start peering before they had processed the maps they needed to look at; the fix might not have been backported to Emperor. But I'd like to be sure this isn't some other issue you're seeing. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Sat, Mar 22, 2014 at 8:16 PM, Quenten Grasso <qgrasso@xxxxxxxxxx> wrote: >> Hi Kyle, >> >> Thanks, I turned on debug ms = 1 and debug osd = 10 and restarted osd.54 heres here's log for that one. >> >> ceph-osd.54.log.bz2 >> http://www67.zippyshare.com/v/99704627/file.html >> >> >> Strace osd 53, >> strace.zip >> http://www43.zippyshare.com/v/17581165/file.html >> >> >> Thanks, >> Quenten >> -----Original Message----- >> From: Kyle Bader [mailto:kyle.bader@xxxxxxxxx] >> Sent: Sunday, 23 March 2014 12:10 PM >> To: Quenten Grasso >> Subject: Re: OSD Restarts cause excessively high load average and "requests are blocked > 32 sec" >> >>> Any ideas on why the load average goes so crazy & starts to block IO? >> >> Could you turn on "debug ms = 1" and "debug osd = 10" prior to restarting the OSDs on one of your hosts and sharing the logs so we can take a look? >> >> It also might be worth while to strace one of the OSDs to try to determine what it's working so hard on, maybe: >> >> strace -fc -p <osd pid> > strace.osd1.log >> >> Thanks! >> >> -- >> >> Kyle >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com