Just try to give the booting OSD and all MONs the resources they ask for (CPU, memory). Yes, it causes disruption but only for a select group of clients, and only for a moment (<20s with my extremely high number of PGs). >From a service provider perspective this might break SLAs, but until you get to >60s blocked requests if will not manifest in clients (VM guests) directly. Ad fixing it - I still think that the only way to achieve any form of SLA I/O should never hang if one of the acting OSDs is acting up. In other words, doing a write to min_size OSDs should always be enough, let the the rest of the writes buffer somewhere for a while. Even if you fix this one case, there are other situations where one OSD can block I/O unnecessarily. Jan > On 10 Dec 2015, at 10:03, Christian Kauhaus <kc@xxxxxxxxxxxxxxx> wrote: > > Am 10.12.2015 um 06:38 schrieb Robert LeBlanc: >> I noticed this a while back and did some tracing. As soon as the PGs >> are read in by the OSD (very limited amount of housekeeping done), the >> OSD is set to the "in" state so that peering with other OSDs can >> happen and the recovery process can begin. The problem is that when >> the OSD is "in", the clients also see that and start sending requests >> to the OSDs before it has had a chance to actually get its bearings >> and is able to even service the requests. After discussion with some >> of the developers, there is no easy way around this other than let the >> PGs recover to other OSDs and then bring in the OSDs after recovery (a >> ton of data movement). > > Many thanks for your detailed analysis. It's a bit disappointing that there > seems to be no easy way around. Any work to improve the situation is much > appreciated. > > In the meantime, I'll be experimenting with pre-seeding the VFS cache to speed > things up at least a little bit. > > Regards > > Christian > > -- > Dipl-Inf. Christian Kauhaus <>< · kc@xxxxxxxxxxxxxxx · +49 345 219401-0 > Flying Circus Internet Operations GmbH · http://flyingcircus.io > Forsterstraße 29 · 06112 Halle (Saale) · Deutschland > HR Stendal 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html