On Fri, 12 Dec 2014, Loic Dachary wrote: > Hi Sam & Sage, > > In the context of http://tracker.ceph.com/issues/9566 I'm inclined to > think the best solution would be that the AsyncReserver choose a PG > instead of just picking the next one in the list when there is a free > slot. It would always choose a PG that must move to/from an OSDs for > which there are more PGs waiting in the AsyncRerserver than any other > OSD. The sort involved does not seem too expensive. > > Calculating priorities before adding the PG to the AsyncReserver seems > wrong because the state of the system will change significantly while > the PG is waiting to be processed. For instance the first PGs to be > added have a low priority while the next have increasing priorities when > they accumulate. If reservations are canceled because the OSD map > changed again (maybe another OSD is decommissioned before recovery of > the first one completes), you may end up having high priorities for PGs > that are no longer associated with busy OSDs. That could backfire and > create even more frequent long tails. > > What do you think ? That makes sense. In order to make that decision, it means that the OSDs need to be sharing the level of recovery work they have pending on a regular basis, right? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html