Re: Ceph PG Incomplete = Cluster unusable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, Jan 8, 2015 at 8:31 PM, Christian Balzer <chibi@xxxxxxx> wrote:
> On Thu, 8 Jan 2015 11:41:37 -0700 Robert LeBlanc wrote:
> Which of course currently means a strongly consistent lockup in these
> scenarios. ^o^

That is one way of putting it

> Slightly off-topic and snarky, that strong consistency is of course of
> limited use when in the case of a corrupted PG Ceph basically asks you to
> toss a coin.
> As in minor corruption, impossible for a mere human to tell which
> replica is the good one, because one OSD is down and the 2 remaining ones
> differ by one bit or so.

This is where checksumming is supposed to come in. I think Sage has been leading that initiative. Basically, when an OSD reads an object it should be able to tell if there was bit rot by hashing what it just read and checking the MD5SUM that it did when it first received the object. If it doesn't match it can ask another OSD until it finds one that matches.

This provides a number of benefits:
  1. Protect against bit rot. Checked on read and on deep scrub.
  2. Automatically recover the correct version of the object.
  3. If the client computes the MD5SUM before it sent over the wire, the data can be guaranteed through the memory of several machines/devices/cables/etc.
  4. Getting by with "size" 2 is less risky for those who really want to do that.
With all these benefits, there is a trade-off associated with it, mostly CPU. However with the inclusion of AES in silicon, it may not be a huge issue now. But, I'm not a programmer and familiar with the aspect of the Ceph code to be authoritative in any way.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux