Re: Ceph inconsistency after deep-scrub

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/21/2014 10:46 PM, Paweł Sadowski wrote:
> W dniu 21.11.2014 o 20:12, Gregory Farnum pisze:
>> On Fri, Nov 21, 2014 at 2:35 AM, Paweł Sadowski <ceph@xxxxxxxxx> wrote:
>>> Hi,
>>>
>>> During deep-scrub Ceph discovered some inconsistency between OSDs on my
>>> cluster (size 3, min size 2). I have fund broken object and calculated
>>> md5sum of it on each OSD (osd.195 is acting_primary):
>>>  osd.195 - md5sum_aaaa
>>>  osd.40 - md5sum_aaaa
>>>  osd.314 - md5sum_bbbb
>>>
>>> I run ceph pg repair and Ceph successfully reported that everything went
>>> OK. I checked md5sum of the objects again:
>>>  osd.195 - md5sum_bbbb
>>>  osd.40 - md5sum_bbbb
>>>  osd.314 - md5sum_bbbb
>>>
>>> This is a bit odd. How Ceph decides which copy is the correct one? Based
>>> on last modification time/sequence number (or similar)? If yes, then why
>>> object can be stored on one node only? If not, then why Ceph selected
>>> osd.314 as a correct one? What would happen if osd.314 goes down? Will
>>> ceph return wrong (old?) data, even with three copies and no failure in
>>> the cluster?
>> Right now, Ceph recovers replicated PGs by pushing the primary's copy
>> to everybody. There are tickets to improve this, but for now it's best
>> if you handle this yourself by moving the right things into place, or
>> removing the primary's copy if it's incorrect before running the
>> repair command. This is why it doesn't do repair automatically.
>> -Greg
> But in my case Ceph used non-primary's copy to repair data while two
> other OSDs had the same data (and one of them was primary). That
> should not happen.
>
> Beside that there should be big red warning in documentation[1]
> regarding /ceph pg repair/.
>
> 1:
> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent

Does any of you use "filestore_sloppy_crc" option? It's not documented
(on purpose I assume) but it allows to detect bad/broken data on OSD
(and crash).

Cheers,
PS
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux