On Wed, 1 Mar 2017, Loic Dachary wrote: > Hi Sage, > > While reading the implementation and the commits related to crush > tunables & rule step set_* I got the impression that vary_r[1] > introduced early 2014 is a better solution than local_fallback_tries > etc. [2] introduced around 2012. Both are helping to select an > alternative item when there is a collision. Yeah, although if memory serves local_fallback_tries was part of the original CRUSH code. The problem with the local retry was that it skews data toward drives that are close to an 'out' drive in the hierarchy. I.e., if you have 12-drive server and mark one of them out, the other 11 get most of the data instead. That's clearly not optimal, which is why that local retries now defaults to 0. (Caleb figured this out when he was doing his analysis back in 2012 or 2013.) > If that's not the case I'll try to research the motivation to set > local_fallback_retries = 1 instead of the default value 0 it has in > tunables since bobtail[3]. And although nothing forbids using > local_fallback_tries = 1 & vary_r = 1, it looks like the vary_r > implementation did not take into account that combination. Yeah; the assumption is that local fallback tries should never be used except for compatibility with old maps with old tunables. I honestly don't remember exactly what motivated me to do the local fallback.. I think it seemed intuitive at the time, but in retrospect is not a good idea. The vary_r was added to address a different specific issue: when using chooseleaf, we do the first level choice (say, a rack) and then recursively pick a disk nested beneath it. The problem was that we'd collide in the second choice (disk), increment r, and try again... but if we picked the same rack the recursive call to choose a disk would use the original r value and inevitabl collide again. Chooseleaf was introduced after fallback_local_tries, so the motivation was different... sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html