Re: Reducing the impact of OSD restarts (noout ain't uptosnuff)

Nick Fisk <nick@xxxxxxxxxx> · Fri, 12 Feb 2016 16:42:03 -0000

I wonder if Christian is hitting some performance issue when the OSD or
number of OSD's all start up at once? Or maybe the OSD is still doing some
internal startup procedure and when the IO hits it on a very busy cluster,
it causes it to become overloaded for a few seconds?

I've seen similar things in the past where if I did not have enough min free
KB's configured, PG's would take a long time to peer/activate and cause slow
ops.

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Steve Taylor
> Sent: 12 February 2016 16:32
> To: Nick Fisk <nick@xxxxxxxxxx>; 'Christian Balzer' <chibi@xxxxxxx>; ceph-
> users@xxxxxxxxxxxxxx
> Subject: Re:  Reducing the impact of OSD restarts (noout ain't
> uptosnuff)
> 
> Nick is right. Setting noout is the right move in this scenario.
Restarting an
> OSD shouldn't block I/O unless nodown is also set, however. The exception
> to this would be a case where min_size can't be achieved because of the
> down OSD, i.e. min_size=3 and 1 of 3 OSDs is restarting. That would
certainly
> block writes. Otherwise the cluster will recognize down OSDs as down
> (without nodown set), redirect I/O requests to OSDs that are up, and
backfill
> as necessary when things are back to normal.
> 
> You can set min_size to something lower if you don't have enough OSDs to
> allow you to restart one without blocking writes. If this isn't the case,
> something deeper is going on with your cluster. You shouldn't get slow
> requests due to restarting a single OSD with only noout set and idle disks
on
> the remaining OSDs. I've done this many, many times.
> 
> Steve Taylor | Senior Software Engineer | StorageCraft Technology
> Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 | Fax: 801.545.4705
> 
> If you are not the intended recipient of this message, be advised that any
> dissemination or copying of this message is prohibited.
> If you received this message erroneously, please notify the sender and
> delete it, together with any attachments.
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Nick Fisk
> Sent: Friday, February 12, 2016 9:07 AM
> To: 'Christian Balzer' <chibi@xxxxxxx>; ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Reducing the impact of OSD restarts (noout ain't
> uptosnuff)
> 
> 
> 
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> > Of Christian Balzer
> > Sent: 12 February 2016 15:38
> > To: ceph-users@xxxxxxxxxxxxxx
> > Subject: Re:  Reducing the impact of OSD restarts (noout
> > ain't
> > uptosnuff)
> >
> > On Fri, 12 Feb 2016 15:56:31 +0100 Burkhard Linke wrote:
> >
> > > Hi,
> > >
> > > On 02/12/2016 03:47 PM, Christian Balzer wrote:
> > > > Hello,
> > > >
> > > > yesterday I upgraded our most busy (in other words lethally
> > > > overloaded) production cluster to the latest Firefly in
> > > > preparation for a Hammer upgrade and then phasing in of a cache
tier.
> > > >
> > > > When restarting the ODSs it took 3 minutes (1 minute in a
> > > > consecutive repeat to test the impact of primed caches) during
> > > > which the cluster crawled to a near stand-still and the dreaded
> > > > slow requests piled up, causing applications in the VMs to fail.
> > > >
> > > > I had of course set things to "noout" beforehand, in hopes of
> > > > staving off this kind of scenario.
> > > >
> > > > Note that the other OSDs and their backing storage were NOT
> > > > overloaded during that time, only the backing storage of the OSD
> > > > being restarted was under duress.
> > > >
> > > > I was under the (wishful thinking?) impression that with noout set
> > > > and a controlled OSD shutdown/restart, operations would be
> > > > redirect to the new primary for the duration.
> > > > The strain on the restarted OSDs when recovering those operations
> > > > (which I also saw) I was prepared for, the near screeching halt
> > > > not so much.
> > > >
> > > > Any thoughts on how to mitigate this further or is this the
> > > > expected behavior?
> > >
> > > I wouldn't use noout in this scenario. It keeps the cluster from
> > > recognizing that a OSD is not available; other OSD will still try to
> > > write to that OSD. This is probably the cause of the blocked requests.
> > > Redirecting only works if the cluster is able to detect a PG as
> > > being degraded.
> > >
> > Oh well, that makes of course sense, but I found some article stating
> > that
> it
> > also would redirect things and the recovery activity I saw afterwards
> suggests
> > it did so at some point.
> 
> Doesn't noout just stop the crushmap from being modified and hence data
> shuffling. Nodown controls whether or not the OSD is available for IO?
> 
> Maybe try the reverse. Set noup so that OSD's don't participate in IO and
> then bring them in manually?
> 
> >
> > > If the cluster is aware of the OSD being missing, it could handle
> > > the write requests more gracefully. To prevent it from backfilling
> > > etc, I prefer to use nobackfill and norecover. It blocks backfill on
> > > the cluster level, but allows requests to be carried out (at least
> > > in my understanding of these flags).
> > >
> > Yes, I concur and was thinking of that as well. Will give it a spin
> > with
> the
> > upgrade to Hammer.
> >
> > > 'noout' is fine for large scale cluster maintenance, since it keeps
> > > the cluster from backfilling. I've used when I had to power down our
> > > complete cluster.
> > >
> > Guess with my other, less busy clusters, this never showed up on my
radar.
> >
> > Regards,
> >
> > Christian
> > > Regards,
> > > Burkhard
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> >
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com