Re: Reducing the impact of OSD restarts (noout ain't uptosnuff)

Christian Balzer <chibi@xxxxxxx> · Sat, 13 Feb 2016 00:38:15 +0900



On Fri, 12 Feb 2016 15:56:31 +0100 Burkhard Linke wrote:

> Hi,
> 
> On 02/12/2016 03:47 PM, Christian Balzer wrote:
> > Hello,
> >
> > yesterday I upgraded our most busy (in other words lethally overloaded)
> > production cluster to the latest Firefly in preparation for a Hammer
> > upgrade and then phasing in of a cache tier.
> >
> > When restarting the ODSs it took 3 minutes (1 minute in a consecutive
> > repeat to test the impact of primed caches) during which the cluster
> > crawled to a near stand-still and the dreaded slow requests piled up,
> > causing applications in the VMs to fail.
> >
> > I had of course set things to "noout" beforehand, in hopes of staving
> > off this kind of scenario.
> >
> > Note that the other OSDs and their backing storage were NOT overloaded
> > during that time, only the backing storage of the OSD being restarted
> > was under duress.
> >
> > I was under the (wishful thinking?) impression that with noout set and
> > a controlled OSD shutdown/restart, operations would be redirect to the
> > new primary for the duration.
> > The strain on the restarted OSDs when recovering those operations
> > (which I also saw) I was prepared for, the near screeching halt not so
> > much.
> >
> > Any thoughts on how to mitigate this further or is this the expected
> > behavior?
> 
> I wouldn't use noout in this scenario. It keeps the cluster from 
> recognizing that a OSD is not available; other OSD will still try to 
> write to that OSD. This is probably the cause of the blocked requests. 
> Redirecting only works if the cluster is able to detect a PG as being 
> degraded.
> 
Oh well, that makes of course sense, but I found some article stating that
it also would redirect things and the recovery activity I saw afterwards
suggests it did so at some point.

> If the cluster is aware of the OSD being missing, it could handle the 
> write requests more gracefully. To prevent it from backfilling etc, I 
> prefer to use nobackfill and norecover. It blocks backfill on the 
> cluster level, but allows requests to be carried out (at least in my 
> understanding of these flags).
> 
Yes, I concur and was thinking of that as well. Will give it a spin with
the upgrade to Hammer.

> 'noout' is fine for large scale cluster maintenance, since it keeps the 
> cluster from backfilling. I've used when I had to power down our 
> complete cluster.
> 
Guess with my other, less busy clusters, this never showed up on my radar.

Regards,

Christian
> Regards,
> Burkhard
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com