I'm new to Ceph, and considering using it to store a bunch of static files in the RADOS Gateway. My files are all versioned, so we never modify files. We only add new files, and delete unused files.
I'm trying to figure out how to back everything up, to protect against administrative and application errors.
I'm thinking about building one Ceph cluster that spans
my primary and backup datacenters, with CRUSH rules that would
store 2 replicas in each datacenter. I want to use BtrFS
snapshots, like http://blog.rot13.org/2010/02/using_btrfs_snapshots_for_incremental_backup.html, but automated and with cleanup. I'm
doing something similar now, on my NFS servers with ZFS and a
tool called zfs-snapshot-mgmt.
I read that only XFS is recommended for production
clusters, since BtrFS itself is still beta. Any idea how long
until BtrFS is usable in production? I'd prefer to run Ceph on ZFS, but I see there are some outstanding issues in tracker. Is anybody doing Ceph on ZFS in production? ZFS itself seems to be father along than BtrFS. Are there plans to make ZFS a first class supported filesystem for Ceph?
In the event of an operator or code error, I would mount the correct BtrFS snapshot on all nodes in the backup datacenter, someplace like /var/lib/ceph.restore/. Then I'd make a copy of ceph.conf, and start building a temporary cluster that runs on a non-standard port, made up of only the backup datacenter machines. The normal cluster would stay up and running. Once the temporary cluster is up, I'd manually restore the RADOS Gateway objects that needed to be restored.
If there was ever a full cluster problem, like I did something stupid like rados rmpool metadata. I'd shut down the whole cluster, and revert all of the BtrFS partitions to the last known good snapshot, and re-format all of the XFS partitions. Start the cluster up again, and let Ceph replicate everything back to freshly formatted partitions. I'd lose recent data, but it's better than losing all of the data.
Obviously, both of
these scenarios would need a lot of testing and many practice
runs before they're viable. Has anybody tried this before? If
not, do you see any problems with the theory?
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com