On Wed, Feb 8, 2012 at 19:14, Josh Durgin <josh.durgin@xxxxxxxxxxxxx> wrote: > It's possible to do what the current repair code does > automatically, but this would be a bad idea since it just takes > the first osd (with primary before replicas) to have the object > as authoritative, and copies it to all the relevant osds. If the > primary has a corrupt copy, this corruption will spread to other > osds. In your case, since you removed the object entirely, repair > could correct it. At the risk of saying the obvious.. If you have >=3 copies, you could hash them all, and let the majority decide which is the "good" copy. An admin could do this manually, just deleting the bad one and letting scrub repair it, and later on we might be able to automate it. I'm not sure if Dynamo's/Cassandra's anti-entropy feature does this, or if it's a simple "master overwrites slaves", and I realize the multi-party communication is sort of hard to coordinate, but it's definitely possible. I loves me some Merkle trees. Of course, there might be cases where e.g. all 3 replicas have different content. In many ways, getting a hash stored alongside is object is significantly better, and might be a better route to go -- our objects are big enough, as opposed to typical Dynamo/Cassandra cells that are often smaller than a sha1. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html