Faidon/paravoid's cluster has a bunch of OSDs that are up, but the pg queries indicate they are tens of thousands of epochs behind: "history": { "epoch_created": 14, "last_epoch_started": 88174, "last_epoch_clean": 88174, "last_epoch_split": 0, "same_up_since": 88172, "same_interval_since": 88172, "same_primary_since": 88172, (where the current map epoch is 102000 or thereabouts). I think just restarting all OSDs at once will get him caught up (esp with a 'ceph osd set noup' block until they are done processing maps), but I wonder if we may want an additional check that if any PG falls more than X epochs behind the OSD marks it self down and catches up before coming in... What do you think? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html