Re: pg scrub and auto repair in hammer

Christian Balzer <chibi@xxxxxxx> · Tue, 28 Jun 2016 16:42:15 +0900

On Tue, 28 Jun 2016 09:15:50 +0200 Stefan Priebe - Profihost AG wrote:

> 
> Am 28.06.2016 um 09:06 schrieb Christian Balzer:
> > 
> > Hello,
> > 
> > On Tue, 28 Jun 2016 08:34:26 +0200 Stefan Priebe - Profihost AG wrote:
> > 
> >> Am 27.06.2016 um 02:14 schrieb Christian Balzer:
> >>> On Sun, 26 Jun 2016 19:48:18 +0200 Stefan Priebe wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> is there any option or chance to have auto repair of pgs in hammer?
> >>>>
> >>> Short answer: 
> >>> No, in any version of Ceph.
> >>>
> >>> Long answer:
> >>> There are currently no checksums generated by Ceph and present to
> >>> facilitate this.
> >>
> >> Yes but if you have a replication count of 3 ceph pg repair was always
> >> working for me since bobtail. I've never seen corrupted data.
> >>
> > That's good and lucky for you.
> > 
> > Not seeing corrupted data also doesn't mean there wasn't any
> > corruption, it could simply mean that the data in question wasn't used
> > or overwritten before being read again.
> 
> Sure that's correct ;-) It just has happened so often that i thought
> this could not always be the case. We had a lot of kernel crashes the
> last month regarding XFS and bcache.
> 
That's something slightly different than silent data corruption, but I
can't really comment on this.

> > In the handful of scrub errors I ever encountered there was one case
> > where blindly doing a repair from the primary PG would have been the
> > wrong thing to do.
> 
> Are your sure it really simply uses the primary pg? i always thought it
> compares the sizes of the object and date with a replication factor 3.
> 
Yes, I'm sure:
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg10182.html

Sage also replied in a similar vein to you about this 4 years ago:
http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/11575
"In general we don't repair automatically lest we inadvertantly propagate
bad data or paper over a bug."

And finally the Bluestore tech talk from last week from 35:40.
https://www.youtube.com/watch?v=kuacS4jw5pM

Christian
> 
> >>> If you'd run BTRFS or ZFS with filestore you'd be closer to an
> >>> automatic state of affairs, as these filesystems do strong checksums
> >>> and check them on reads and would create an immediate I/O error if
> >>> something got corrupted, thus making it clear which OSD is in need of
> >>> the hammer of healing.
> >>
> >> Yes but at least BTRFS is still not working for ceph due to
> >> fragmentation. I've even tested a 4.6 kernel a few weeks ago. But it
> >> doubles it's I/O after a few days.
> >>
> > Nobody (well, certainly not me) suggested to use BTRFS, especially with
> > Bluestore "around the corner".
> > 
> > Just pointing out that it has the necessary checksumming features.
> 
> Sure. sorry.
> 
> Stefan
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com