On Tue, Feb 19, 2013 at 2:52 PM, Alexandre Oliva <oliva@xxxxxxx> wrote: > It recently occurred to me that I messed up an OSD's storage, and > decided that the easiest way to bring it back was to roll it back to an > earlier snapshot I'd taken (along the lines of clustersnap) and let it > recover from there. > > The problem with that idea was that the cluster had advanced too much > since the snapshot was taken: the latest OSDMap known by that snapshot > was far behind the range still carried by the monitors. > > Determined to let that osd recover from all the data it already had, > rather than restarting from scratch, I hacked up a “solution” that > appears to work: with the patch below, the OSD will use the contents of > an earlier OSDMap (presumably the latest one it has) for a newer OSDMap > it can't get any more. > > A single run of osd with this patch was enough for it to pick up the > newer state and join the cluster; from then on, the patched osd was no > longer necessary, and presumably should not be used except for this sort > of emergency. > > Of course this can only possibly work reliably if other nodes are up > with same or newer versions of each of the PGs (but then, rolling back > the OSD to an older snapshot would't be safe otherwise). I don't know > of any other scenarios in which this patch will not recover things > correctly, but unless someone far more familiar with ceph internals than > I am vows for it, I'd recommend using this only if you're really > desperate to avoid a recovery from scratch, and you save snapshots of > the other osds (as you probably already do, or you wouldn't have older > snapshots to rollback to :-) and the mon *before* you get the patched > ceph-osd to run, and that you stop the mds or otherwise avoid changes > that you're not willing to lose should the patch not work for you and > you have to go back to the saved state and let the osd recover from > scratch. If it works, lucky us; if it breaks, well, I told you :-) Yeah, this ought to basically work but it's very dangerous — potentially breaking invariants about cluster state changes, etc. I wouldn't use it if the cluster wasn't otherwise healthy; other nodes breaking in the middle of this operation could cause serious problems, etc. I'd much prefer that one just recovers over the wire using normal recovery paths... ;) -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html