Re: Comments on Ceph.com's blog article 'Ceph's New Monitor Changes'

Mark Kampe <mark.kampe@xxxxxxxxxxx> · Tue, 12 Mar 2013 10:54:05 -0700

It seems to me that the surviving OSDs still remember all of
the osdmap and pgmap history back to "last epoch started"
for all of their PGs.  Isn't this enough to enable reconstruction
of all of the pgmaps and osdmaps required to find any copy of
currently stored object?

My history has given me biases, but I prefer reconstruction over
snapshots because:

 (a) it enables recovery from more catastrophic incidents
     (e.g. a bug has corrupted all of the monitor stores
     or a fire has reduced all monitor nodes to slag)

 (b) it is less likely to result in inconsistencies involving
     object updates after the last snapshot

 (c) the ability to reconstruct is a superset of the ability
     to audit, so we get consistency audits for free

It tends to be a common source of discomfort among potential Ceph
users that if their mons ever become unrecoverable, it's almost
impossible to recover your data (compare to GlusterFS, where you can
always pull data out of Gluster bricks unharmed, at least as long as
you don't use striping volumes). With a file backed mon store, I had
hoped that eventually this might tie into btrfs snapshots such that
you would have been able to roll back to a known good configuration
in an emergency. With the switch to leveldb, I no longer foresee that
ever happening. Mind sharing your thoughts on that?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html