On Thu, 8 Jan 2015 21:17:12 -0700 Robert LeBlanc wrote: > On Thu, Jan 8, 2015 at 8:31 PM, Christian Balzer <chibi@xxxxxxx> wrote: > > On Thu, 8 Jan 2015 11:41:37 -0700 Robert LeBlanc wrote: > > Which of course currently means a strongly consistent lockup in these > > scenarios. ^o^ > > That is one way of putting it > If I had the time and more importantly the talent to help with code, I'd do so. Failing that, pointing out the often painful truth is something I can do. > > Slightly off-topic and snarky, that strong consistency is of course of > > limited use when in the case of a corrupted PG Ceph basically asks you > > to toss a coin. > > As in minor corruption, impossible for a mere human to tell which > > replica is the good one, because one OSD is down and the 2 remaining > > ones differ by one bit or so. > > This is where checksumming is supposed to come in. I think Sage has been > leading that initiative. Yeah, I'm aware of that effort. Of course in the meantime even a very simple majority vote would be most welcome and helpful in nearly all cases (with 3 replicas available). One wonders if this is basically acknowledging that while offloading some things like checksums to the underlying layer/FS are desirable from a codebase/effort/complexity view, neither BTRFS or ZFS are fully production ready and won't be for some time. > Basically, when an OSD reads an object it should > be able to tell if there was bit rot by hashing what it just read and > checking the MD5SUM that it did when it first received the object. If it > doesn't match it can ask another OSD until it finds one that matches. > > This provides a number of benefits: > > 1. Protect against bit rot. Checked on read and on deep scrub. > 2. Automatically recover the correct version of the object. > 3. If the client computes the MD5SUM before it sent over the wire, the > data can be guaranteed through the memory of several > machines/devices/cables/etc. > 4. Getting by with "size" 2 is less risky for those who really want to > do that. > > With all these benefits, there is a trade-off associated with it, mostly > CPU. However with the inclusion of AES in silicon, it may not be a huge > issue now. But, I'm not a programmer and familiar with the aspect of the > Ceph code to be authoritative in any way. Yup, all very useful and pertinent points. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com