Re: Scrub Error / How does ceph pg repair work?

Christian Balzer <chibi@xxxxxxx> · Tue, 12 May 2015 15:20:18 +0900

Hello,

I can only nod emphatically to what Robert said, don't issue repairs
unless you 
a) don't care about the data or 
b) have verified that your primary OSD is good.

See this for some details on how establish which replica(s) are actually
good or not:
http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/

Of course if you somehow wind up with more subtle data corruption and are
faced with 3 slightly differing data sets, you may have have to resort to
rolling a dice after all.

A word from the devs about the state of checksums and automatic repairs we
can trust would be appreciated.

Christian

On Mon, 11 May 2015 10:19:08 -0600 Robert LeBlanc wrote:

> Personally I would not just run this command automatically because as you
> stated, it only copies the primary PGs to the replicas and if the primary
> is corrupt, you will corrupt your secondaries.I think the monitor log
> shows which OSD has the problem so if it is not your primary, then just
> issue the repair command.
> 
> There was talk, and I believe work towards, Ceph storing a hash of the
> object so that it can be smarter about which replica has the correct data
> and automatically replicate the good data no matter where it is. I think
> the first part, creating the hash and storing it, has been included in
> Hammer. I'm not an authority on this so take it with a grain of salt.
> 
> Right now our procedure is to find the PG files on the OSDs, perform a
> MD5 on all of them and the one that doesn't match, overwrite, either by
> issuing the PG repair command, or removing the bad PG files, rsyncing
> them with the -X argument and then instructing a deep-scrub on the PG to
> clear it up in Ceph.
> 
> I've only tested this on an idle cluster, so I don't know how well it
> will work on an active cluster. Since we issue a deep-scrub, if the PGs
> of the replicas change during the rsync, it should come up with an
> error. The idea is to keep rsyncing until the deep-scrub is clean. Be
> warned that you may be aiming your gun at your foot with this!
> 
> ----------------
> Robert LeBlanc
> GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> On Mon, May 11, 2015 at 2:09 AM, Christian Eichelmann <
> christian.eichelmann@xxxxxxxx> wrote:
> 
> > Hi all!
> >
> > We are experiencing approximately 1 scrub error / inconsistent pg every
> > two days. As far as I know, to fix this you can issue a "ceph pg
> > repair", which works fine for us. I have a few qestions regarding the
> > behavior of the ceph cluster in such a case:
> >
> > 1. After ceph detects the scrub error, the pg is marked as
> > inconsistent. Does that mean that any IO to this pg is blocked until
> > it is repaired?
> >
> > 2. Is this amount of scrub errors normal? We currently have only 150TB
> > in our cluster, distributed over 720 2TB disks.
> >
> > 3. As far as I know, a "ceph pg repair" just copies the content of the
> > primary pg to all replicas. Is this still the case? What if the primary
> > copy is the one having errors? We have a 4x replication level and it
> > would be cool if ceph would use one of the pg for recovery which has
> > the same checksum as the majority of pgs.
> >
> > 4. Some of this errors are happening at night. Since ceph reports this
> > as a critical error, our shift is called and wake up, just to issue a
> > single command. Do you see any problems in triggering this command
> > automatically via monitoring event? Is there a reason why ceph isn't
> > resolving these errors itself when it has enought replicas to do so?
> >
> > Regards,
> > Christian
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com