On Fri, 12 Feb 2016 15:56:31 +0100 Burkhard Linke wrote: > Hi, > > On 02/12/2016 03:47 PM, Christian Balzer wrote: > > Hello, > > > > yesterday I upgraded our most busy (in other words lethally overloaded) > > production cluster to the latest Firefly in preparation for a Hammer > > upgrade and then phasing in of a cache tier. > > > > When restarting the ODSs it took 3 minutes (1 minute in a consecutive > > repeat to test the impact of primed caches) during which the cluster > > crawled to a near stand-still and the dreaded slow requests piled up, > > causing applications in the VMs to fail. > > > > I had of course set things to "noout" beforehand, in hopes of staving > > off this kind of scenario. > > > > Note that the other OSDs and their backing storage were NOT overloaded > > during that time, only the backing storage of the OSD being restarted > > was under duress. > > > > I was under the (wishful thinking?) impression that with noout set and > > a controlled OSD shutdown/restart, operations would be redirect to the > > new primary for the duration. > > The strain on the restarted OSDs when recovering those operations > > (which I also saw) I was prepared for, the near screeching halt not so > > much. > > > > Any thoughts on how to mitigate this further or is this the expected > > behavior? > > I wouldn't use noout in this scenario. It keeps the cluster from > recognizing that a OSD is not available; other OSD will still try to > write to that OSD. This is probably the cause of the blocked requests. > Redirecting only works if the cluster is able to detect a PG as being > degraded. > Oh well, that makes of course sense, but I found some article stating that it also would redirect things and the recovery activity I saw afterwards suggests it did so at some point. > If the cluster is aware of the OSD being missing, it could handle the > write requests more gracefully. To prevent it from backfilling etc, I > prefer to use nobackfill and norecover. It blocks backfill on the > cluster level, but allows requests to be carried out (at least in my > understanding of these flags). > Yes, I concur and was thinking of that as well. Will give it a spin with the upgrade to Hammer. > 'noout' is fine for large scale cluster maintenance, since it keeps the > cluster from backfilling. I've used when I had to power down our > complete cluster. > Guess with my other, less busy clusters, this never showed up on my radar. Regards, Christian > Regards, > Burkhard > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com