Re: Reducing the impact of OSD restarts (noout ain't uptosnuff)

Nick Fisk <nick@xxxxxxxxxx> · Fri, 12 Feb 2016 16:06:49 -0000

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Christian Balzer
> Sent: 12 February 2016 15:38
> To: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Reducing the impact of OSD restarts (noout ain't
> uptosnuff)
> 
> On Fri, 12 Feb 2016 15:56:31 +0100 Burkhard Linke wrote:
> 
> > Hi,
> >
> > On 02/12/2016 03:47 PM, Christian Balzer wrote:
> > > Hello,
> > >
> > > yesterday I upgraded our most busy (in other words lethally
> > > overloaded) production cluster to the latest Firefly in preparation
> > > for a Hammer upgrade and then phasing in of a cache tier.
> > >
> > > When restarting the ODSs it took 3 minutes (1 minute in a
> > > consecutive repeat to test the impact of primed caches) during which
> > > the cluster crawled to a near stand-still and the dreaded slow
> > > requests piled up, causing applications in the VMs to fail.
> > >
> > > I had of course set things to "noout" beforehand, in hopes of
> > > staving off this kind of scenario.
> > >
> > > Note that the other OSDs and their backing storage were NOT
> > > overloaded during that time, only the backing storage of the OSD
> > > being restarted was under duress.
> > >
> > > I was under the (wishful thinking?) impression that with noout set
> > > and a controlled OSD shutdown/restart, operations would be redirect
> > > to the new primary for the duration.
> > > The strain on the restarted OSDs when recovering those operations
> > > (which I also saw) I was prepared for, the near screeching halt not
> > > so much.
> > >
> > > Any thoughts on how to mitigate this further or is this the expected
> > > behavior?
> >
> > I wouldn't use noout in this scenario. It keeps the cluster from
> > recognizing that a OSD is not available; other OSD will still try to
> > write to that OSD. This is probably the cause of the blocked requests.
> > Redirecting only works if the cluster is able to detect a PG as being
> > degraded.
> >
> Oh well, that makes of course sense, but I found some article stating that
it
> also would redirect things and the recovery activity I saw afterwards
suggests
> it did so at some point.

Doesn't noout just stop the crushmap from being modified and hence data
shuffling. Nodown controls whether or not the OSD is available for IO? 

Maybe try the reverse. Set noup so that OSD's don't participate in IO and
then bring them in manually?

> 
> > If the cluster is aware of the OSD being missing, it could handle the
> > write requests more gracefully. To prevent it from backfilling etc, I
> > prefer to use nobackfill and norecover. It blocks backfill on the
> > cluster level, but allows requests to be carried out (at least in my
> > understanding of these flags).
> >
> Yes, I concur and was thinking of that as well. Will give it a spin with
the
> upgrade to Hammer.
> 
> > 'noout' is fine for large scale cluster maintenance, since it keeps
> > the cluster from backfilling. I've used when I had to power down our
> > complete cluster.
> >
> Guess with my other, less busy clusters, this never showed up on my radar.
> 
> Regards,
> 
> Christian
> > Regards,
> > Burkhard
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com