Re: Reducing the impact of OSD restarts (noout ain't uptosnuff)

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Fri, 12 Feb 2016 15:56:31 +0100

Hi,

On 02/12/2016 03:47 PM, Christian Balzer wrote:
Hello,

yesterday I upgraded our most busy (in other words lethally overloaded)
production cluster to the latest Firefly in preparation for a Hammer
upgrade and then phasing in of a cache tier.

When restarting the ODSs it took 3 minutes (1 minute in a consecutive
repeat to test the impact of primed caches) during which the cluster
crawled to a near stand-still and the dreaded slow requests piled up,
causing applications in the VMs to fail.

I had of course set things to "noout" beforehand, in hopes of staving off
this kind of scenario.

Note that the other OSDs and their backing storage were NOT overloaded
during that time, only the backing storage of the OSD being restarted was
under duress.

I was under the (wishful thinking?) impression that with noout set and a
controlled OSD shutdown/restart, operations would be redirect to the new
primary for the duration.
The strain on the restarted OSDs when recovering those operations (which I
also saw) I was prepared for, the near screeching halt not so much.

Any thoughts on how to mitigate this further or is this the expected
behavior?

I wouldn't use noout in this scenario. It keeps the cluster from 
recognizing that a OSD is not available; other OSD will still try to 
write to that OSD. This is probably the cause of the blocked requests. 
Redirecting only works if the cluster is able to detect a PG as being 
degraded.

If the cluster is aware of the OSD being missing, it could handle the 
write requests more gracefully. To prevent it from backfilling etc, I 
prefer to use nobackfill and norecover. It blocks backfill on the 
cluster level, but allows requests to be carried out (at least in my 
understanding of these flags).

'noout' is fine for large scale cluster maintenance, since it keeps the 
cluster from backfilling. I've used when I had to power down our 
complete cluster.

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com