It seems to me that the surviving OSDs still remember all of the osdmap and pgmap history back to "last epoch started" for all of their PGs. Isn't this enough to enable reconstruction of all of the pgmaps and osdmaps required to find any copy of currently stored object? My history has given me biases, but I prefer reconstruction over snapshots because: (a) it enables recovery from more catastrophic incidents (e.g. a bug has corrupted all of the monitor stores or a fire has reduced all monitor nodes to slag) (b) it is less likely to result in inconsistencies involving object updates after the last snapshot (c) the ability to reconstruct is a superset of the ability to audit, so we get consistency audits for free
It tends to be a common source of discomfort among potential Ceph users that if their mons ever become unrecoverable, it's almost impossible to recover your data (compare to GlusterFS, where you can always pull data out of Gluster bricks unharmed, at least as long as you don't use striping volumes). With a file backed mon store, I had hoped that eventually this might tie into btrfs snapshots such that you would have been able to roll back to a known good configuration in an emergency. With the switch to leveldb, I no longer foresee that ever happening. Mind sharing your thoughts on that?
-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html