The only way I can see that this would happen is if maps were being generated much more quickly than pgs could be updated...the solution to that would be to throttle new map handling to the rate at which PGs consume them at the OSD. Alternately, you could tweak the map creation rate at the mons. -Sam On Fri, Jan 25, 2013 at 10:01 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > On Friday, January 25, 2013 at 9:50 AM, Sage Weil wrote: >> Faidon/paravoid's cluster has a bunch of OSDs that are up, but the pg >> queries indicate they are tens of thousands of epochs behind: >> >> "history": { "epoch_created": 14, >> "last_epoch_started": 88174, >> "last_epoch_clean": 88174, >> "last_epoch_split": 0, >> "same_up_since": 88172, >> "same_interval_since": 88172, >> "same_primary_since": 88172, >> >> (where the current map epoch is 102000 or thereabouts). >> >> I think just restarting all OSDs at once will get him caught up (esp with >> a 'ceph osd set noup' block until they are done processing maps), but I >> wonder if we may want an additional check that if any PG falls more than X >> epochs behind the OSD marks it self down and catches up before coming >> in... >> >> What do you think? > > Sam's explained to me why this "shouldn't" happen (since events for each PG get queued on every map update), so it sounds like it would be better to prevent the mess (e.g., add some basic fairness to the PG work queue dispatchers in order to prevent any PG from falling so far behind), rather than trying to clean the mess up. > -Greg > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html