Nick is right. Setting noout is the right move in this scenario. Restarting an OSD shouldn't block I/O unless nodown is also set, however. The exception to this would be a case where min_size can't be achieved because of the down OSD, i.e. min_size=3 and 1 of 3 OSDs is restarting. That would certainly block writes. Otherwise the cluster will recognize down OSDs as down (without nodown set), redirect I/O requests to OSDs that are up, and backfill as necessary when things are back to normal. You can set min_size to something lower if you don't have enough OSDs to allow you to restart one without blocking writes. If this isn't the case, something deeper is going on with your cluster. You shouldn't get slow requests due to restarting a single OSD with only noout set and idle disks on the remaining OSDs. I've done this many, many times. Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | Fax: 801.545.4705 If you are not the intended recipient of this message, be advised that any dissemination or copying of this message is prohibited. If you received this message erroneously, please notify the sender and delete it, together with any attachments. -----Original Message----- From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Nick Fisk Sent: Friday, February 12, 2016 9:07 AM To: 'Christian Balzer' <chibi@xxxxxxx>; ceph-users@xxxxxxxxxxxxxx Subject: Re: Reducing the impact of OSD restarts (noout ain't uptosnuff) > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf > Of Christian Balzer > Sent: 12 February 2016 15:38 > To: ceph-users@xxxxxxxxxxxxxx > Subject: Re: Reducing the impact of OSD restarts (noout > ain't > uptosnuff) > > On Fri, 12 Feb 2016 15:56:31 +0100 Burkhard Linke wrote: > > > Hi, > > > > On 02/12/2016 03:47 PM, Christian Balzer wrote: > > > Hello, > > > > > > yesterday I upgraded our most busy (in other words lethally > > > overloaded) production cluster to the latest Firefly in > > > preparation for a Hammer upgrade and then phasing in of a cache tier. > > > > > > When restarting the ODSs it took 3 minutes (1 minute in a > > > consecutive repeat to test the impact of primed caches) during > > > which the cluster crawled to a near stand-still and the dreaded > > > slow requests piled up, causing applications in the VMs to fail. > > > > > > I had of course set things to "noout" beforehand, in hopes of > > > staving off this kind of scenario. > > > > > > Note that the other OSDs and their backing storage were NOT > > > overloaded during that time, only the backing storage of the OSD > > > being restarted was under duress. > > > > > > I was under the (wishful thinking?) impression that with noout set > > > and a controlled OSD shutdown/restart, operations would be > > > redirect to the new primary for the duration. > > > The strain on the restarted OSDs when recovering those operations > > > (which I also saw) I was prepared for, the near screeching halt > > > not so much. > > > > > > Any thoughts on how to mitigate this further or is this the > > > expected behavior? > > > > I wouldn't use noout in this scenario. It keeps the cluster from > > recognizing that a OSD is not available; other OSD will still try to > > write to that OSD. This is probably the cause of the blocked requests. > > Redirecting only works if the cluster is able to detect a PG as > > being degraded. > > > Oh well, that makes of course sense, but I found some article stating > that it > also would redirect things and the recovery activity I saw afterwards suggests > it did so at some point. Doesn't noout just stop the crushmap from being modified and hence data shuffling. Nodown controls whether or not the OSD is available for IO? Maybe try the reverse. Set noup so that OSD's don't participate in IO and then bring them in manually? > > > If the cluster is aware of the OSD being missing, it could handle > > the write requests more gracefully. To prevent it from backfilling > > etc, I prefer to use nobackfill and norecover. It blocks backfill on > > the cluster level, but allows requests to be carried out (at least > > in my understanding of these flags). > > > Yes, I concur and was thinking of that as well. Will give it a spin > with the > upgrade to Hammer. > > > 'noout' is fine for large scale cluster maintenance, since it keeps > > the cluster from backfilling. I've used when I had to power down our > > complete cluster. > > > Guess with my other, less busy clusters, this never showed up on my radar. > > Regards, > > Christian > > Regards, > > Burkhard > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com