Re: Scrub Error / How does ceph pg repair work?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Personally I would not just run this command automatically because as you stated, it only copies the primary PGs to the replicas and if the primary is corrupt, you will corrupt your secondaries.I think the monitor log shows which OSD has the problem so if it is not your primary, then just issue the repair command.

There was talk, and I believe work towards, Ceph storing a hash of the object so that it can be smarter about which replica has the correct data and automatically replicate the good data no matter where it is. I think the first part, creating the hash and storing it, has been included in Hammer. I'm not an authority on this so take it with a grain of salt.

Right now our procedure is to find the PG files on the OSDs, perform a MD5 on all of them and the one that doesn't match, overwrite, either by issuing the PG repair command, or removing the bad PG files, rsyncing them with the -X argument and then instructing a deep-scrub on the PG to clear it up in Ceph.

I've only tested this on an idle cluster, so I don't know how well it will work on an active cluster. Since we issue a deep-scrub, if the PGs of the replicas change during the rsync, it should come up with an error. The idea is to keep rsyncing until the deep-scrub is clean. Be warned that you may be aiming your gun at your foot with this!

----------------
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Mon, May 11, 2015 at 2:09 AM, Christian Eichelmann <christian.eichelmann@xxxxxxxx> wrote:
Hi all!

We are experiencing approximately 1 scrub error / inconsistent pg every
two days. As far as I know, to fix this you can issue a "ceph pg
repair", which works fine for us. I have a few qestions regarding the
behavior of the ceph cluster in such a case:

1. After ceph detects the scrub error, the pg is marked as inconsistent.
Does that mean that any IO to this pg is blocked until it is repaired?

2. Is this amount of scrub errors normal? We currently have only 150TB
in our cluster, distributed over 720 2TB disks.

3. As far as I know, a "ceph pg repair" just copies the content of the
primary pg to all replicas. Is this still the case? What if the primary
copy is the one having errors? We have a 4x replication level and it
would be cool if ceph would use one of the pg for recovery which has the
same checksum as the majority of pgs.

4. Some of this errors are happening at night. Since ceph reports this
as a critical error, our shift is called and wake up, just to issue a
single command. Do you see any problems in triggering this command
automatically via monitoring event? Is there a reason why ceph isn't
resolving these errors itself when it has enought replicas to do so?

Regards,
Christian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux