On 11/21/2014 10:46 PM, Paweł Sadowski wrote: > W dniu 21.11.2014 o 20:12, Gregory Farnum pisze: >> On Fri, Nov 21, 2014 at 2:35 AM, Paweł Sadowski <ceph@xxxxxxxxx> wrote: >>> Hi, >>> >>> During deep-scrub Ceph discovered some inconsistency between OSDs on my >>> cluster (size 3, min size 2). I have fund broken object and calculated >>> md5sum of it on each OSD (osd.195 is acting_primary): >>> osd.195 - md5sum_aaaa >>> osd.40 - md5sum_aaaa >>> osd.314 - md5sum_bbbb >>> >>> I run ceph pg repair and Ceph successfully reported that everything went >>> OK. I checked md5sum of the objects again: >>> osd.195 - md5sum_bbbb >>> osd.40 - md5sum_bbbb >>> osd.314 - md5sum_bbbb >>> >>> This is a bit odd. How Ceph decides which copy is the correct one? Based >>> on last modification time/sequence number (or similar)? If yes, then why >>> object can be stored on one node only? If not, then why Ceph selected >>> osd.314 as a correct one? What would happen if osd.314 goes down? Will >>> ceph return wrong (old?) data, even with three copies and no failure in >>> the cluster? >> Right now, Ceph recovers replicated PGs by pushing the primary's copy >> to everybody. There are tickets to improve this, but for now it's best >> if you handle this yourself by moving the right things into place, or >> removing the primary's copy if it's incorrect before running the >> repair command. This is why it doesn't do repair automatically. >> -Greg > But in my case Ceph used non-primary's copy to repair data while two > other OSDs had the same data (and one of them was primary). That > should not happen. > > Beside that there should be big red warning in documentation[1] > regarding /ceph pg repair/. > > 1: > http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent Does any of you use "filestore_sloppy_crc" option? It's not documented (on purpose I assume) but it allows to detect bad/broken data on OSD (and crash). Cheers, PS _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com