On Mon, Feb 20, 2017 at 02:12:52PM PST, Gregory Farnum spake thusly: > Hmm, I went digging in and sadly this isn't quite right. Thanks for looking into this! This is the answer I was afraid of. Aren't all of those blog entries which talk about using repair and the ceph docs themselves putting people's data at risk? It seems like the only responsible way to deal with inconsistent PGs is to dig into the osd log, look at the reason for the inconistency, examine the data on disk, determine which one is good and which is bad, and delete the bad one? > The code has a lot of internal plumbing to allow more smarts than were > previously feasible and the erasure-coded pools make use of them for > noticing stuff like local corruption. Replicated pools make an attempt > but it's not as reliable as one would like and it still doesn't > involve any kind of voting mechanism. This is pretty surprising. I would have thought a best two out of three voting mechanism in a triple replicated setup would be the obvious way to go. It must be more difficult to implement than I suppose. > A self-inconsistent replicated primary won't get chosen. A primary is > self-inconsistent when its digest doesn't match the data, which > happens when: > 1) the object hasn't been written since it was last scrubbed, or > 2) the object was written in full, or > 3) the object has only been appended to since the last time its digest > was recorded, or > 4) something has gone terribly wrong in/under LevelDB and the omap > entries don't match what the digest says should be there. At least there's some sort of basic heuristic which attempts to do the right thing even if the whole process isn't as thorough as it could be. > David knows more and correct if I'm missing something. He's also > working on interfaces for scrub that are more friendly in general and > allow administrators to make more fine-grained decisions about > recovery in ways that cooperate with RADOS. These will be very welcome improvements! -- Tracy Reed
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com