Hey, within this mail data is a reference to rados chunks so the actual data behind (fs/object/block storage). I was thinking about different scenarios, which could lead to data-loss. 1. The usual stupid customer deleting some important data. 2. The not so usual, totally corrupted cluster after upgrade or sorts. 3. The fun to think about "datacenter struck by [disaster] - nothing left" scenario. While thinking about these scenarios, I wondered how these disasters and the mentioned data-loss could be prevented. Telling customers data is lost, be it self inflicted or nature inflicted, is nothing you want or should need to do. But what are the technical solutions to provide another layer of disaster recovery (not just one datacenter with n replicas)? Some ideas, which came to mind: 1. Snapshotting (ability to get user deleted files + revert to old state after corruption) 2. Offsite backup (ability to recover from a lost datacenter) With these ideas a few problems came to mind. Is it cost effective to backup the whole cluster (would probably backup all replicas, which is not good at all?)? Is there a way to snapshot the current state and back it up to some offsite server array, could be another ceph cluster or a NAS? Do you really want to snapshot the non readable Ceph objects from rados? Shouldn't a backup always be readable? The simplest solution darkfaded from irc came up with was using special replicas. Using additional replicas, which only sync hourly, daily or monthly and dettach after sync could be a solution. But how could that be done? Some benefits of this solution: 1. Readable, cause it could a fully functioning cluster. Doable? Need for replication of gateways etc. or could that be intergrated within a special replica backup? 2. Easy recovery, just make the needed replica the "master". 3. No new system. Ceph in and out \o/ 4. Offsite backup possibility. 5. Versioned states via different replicas hourly, daily, monthly Some problems: 1. strain on ceph cluster when sync is done for each special replica 2. additional disk space needed (could be double the already used amount, when using 3 replicas with one current, one daily, one monthly replica) 3. more costs 4. more complex solution? Could someone shed some light on how to have replicas without the write to be acknowledged for every replica and therefore only be a mirror instead of a full replica. Could this replica based backup be used as current snapshot in another datacenter? Wouldn't that be the async feature, which isn't yet possible sort of? I hope this mail is not too cluttered and I'm looking forward to the thread about it. Hopefully we can not only collect some ideas and solutions, but hear some current implementations from some bigger players. Cheers Michael -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html