Yeah, I'm more concerned about individual object durability. This seems like a good way (in ongoing flapping or whatever) for objects at the tail end of a PG to never get properly replicated even as we expend lots of IO repeatedly recovering earlier objects which are better-replicated. :/ Perhaps min_size et al make this a moot point, but...I don't think so. Haven't worked it all the way through. -Greg On Fri, Nov 6, 2015 at 8:48 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: > Nope, it's worse, there could be arbitrary portions of backfilled and > unbackfilled portions on any particular incomplete osd. We'd need a > backfilled_regions field with a type like map<hobject_t, hobject_t> > mapping backfilled regions begin->end. It's pretty tedious, but > doable provided that we bound how large the mapping gets. I'm > skeptical about how large an effect this would actually have on > overall durability (how frequent is this case?). Once Allen does the > math, we'll have a better idea :) > -Sam > > On Fri, Nov 6, 2015 at 8:43 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> Argh, I guess I was wrong. Sorry for the misinformation, all! :( >> >> If we were to try and do this, Sam, do you have any idea how much it >> would take? Presumably we'd have to add a backfill_begin marker to >> bookend with last_backfill_started, and then everywhere we send over >> object ops we'd have to compare against both of those values. But I'm >> not sure how many sites that's likely to be, what other kinds of paths >> rely on last_backfill_started, or if I'm missing something. >> -Greg >> >> On Fri, Nov 6, 2015 at 8:30 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>> What it actually does is rebuild 3 until it catches up with 2 and then >>> it rebuilds them in parallel (to minimize reads). Optimally, we'd >>> start 3 from where 2 left off and then circle back, but we'd have to >>> complicate the metadata we use to track backfill. >>> -Sam -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html