Re: How safe is ceph pg repair these days?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 20, 2017 at 02:12:52PM PST, Gregory Farnum spake thusly:
> Hmm, I went digging in and sadly this isn't quite right. 

Thanks for looking into this! This is the answer I was afraid of. Aren't
all of those blog entries which talk about using repair and the ceph
docs themselves putting people's data at risk? It seems like the only
responsible way to deal with inconsistent PGs is to dig into the osd
log, look at the reason for the inconistency, examine the data on disk,
determine which one is good and which is bad, and delete the bad one?

> The code has a lot of internal plumbing to allow more smarts than were
> previously feasible and the erasure-coded pools make use of them for
> noticing stuff like local corruption. Replicated pools make an attempt
> but it's not as reliable as one would like and it still doesn't
> involve any kind of voting mechanism.

This is pretty surprising. I would have thought a best two out of three
voting mechanism in a triple replicated setup would be the obvious way
to go. It must be more difficult to implement than I suppose.

> A self-inconsistent replicated primary won't get chosen. A primary is
> self-inconsistent when its digest doesn't match the data, which
> happens when:
> 1) the object hasn't been written since it was last scrubbed, or
> 2) the object was written in full, or
> 3) the object has only been appended to since the last time its digest
> was recorded, or
> 4) something has gone terribly wrong in/under LevelDB and the omap
> entries don't match what the digest says should be there.

At least there's some sort of basic heuristic which attempts to do the
right thing even if the whole process isn't as thorough as it could be.

> David knows more and correct if I'm missing something. He's also
> working on interfaces for scrub that are more friendly in general and
> allow administrators to make more fine-grained decisions about
> recovery in ways that cooperate with RADOS.

These will be very welcome improvements! 

-- 
Tracy Reed

Attachment: signature.asc
Description: PGP signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux