On Wed, Dec 18, 2013 at 11:32 PM, Alexandre Oliva <oliva@xxxxxxx> wrote: > On Dec 18, 2013, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > >> On Tue, Dec 17, 2013 at 3:36 AM, Alexandre Oliva <oliva@xxxxxxx> wrote: >>> Here's an updated version of the patch, that makes it much faster than >>> the earlier version, particularly when the gap between the latest osdmap >>> known by the osd and the earliest osdmap known by the cluster is large. > >> Is this actually still necessary in the latest dumpling and emperor >> branches? > > I can't tell for sure, I don't recall when I last rolled back to an old > snapshot without this kind of patch. > >> I thought sufficiently-old OSDs would go through backfill with the new >> PG members in order to get up-to-date without copying all the data. > > That much is true, for sure. The problem was getting to that point. > > If the latest osdmap known by the osd snapshot turns out to be older > than the earliest map known by the monitors, the osd would give up > because it couldn't make the ends meet: no incremental osdmaps were > available in the cluster, and the osd refused to jump over gaps in the > osdmap sequence. That's why I fudged the unavailable intermediate > osdmaps as clones of the latest one known by the osd: then it would > apply the incremental changes as nops until it got to an actual newer > map, in which it would notice a number of changes, apply them all, and > get on its happy way towards recovery over each of the newer osdmaps ;-) > > I can give a try without the patch if you tell me there's any chance the > osd might now be able to jump over gaps in the osdmap sequence. That > said, the posted patch, ugly as it is, is meant as a stopgap rather than > as a proper solution; dealing with osdmap gaps rather than dying would > be surely a more desirable implementation. I don't remember exactly when it got changed, but I think so. Right Sam? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html