Re: pg scrub and auto repair in hammer

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Tue, 28 Jun 2016 11:02:34 +0200

Am 28.06.2016 um 09:42 schrieb Christian Balzer:
> On Tue, 28 Jun 2016 09:15:50 +0200 Stefan Priebe - Profihost AG wrote:
> 
>>
>> Am 28.06.2016 um 09:06 schrieb Christian Balzer:
>>>
>>> Hello,
>>>
>>> On Tue, 28 Jun 2016 08:34:26 +0200 Stefan Priebe - Profihost AG wrote:
>>>
>>>> Am 27.06.2016 um 02:14 schrieb Christian Balzer:
>>>>> On Sun, 26 Jun 2016 19:48:18 +0200 Stefan Priebe wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> is there any option or chance to have auto repair of pgs in hammer?
>>>>>>
>>>>> Short answer: 
>>>>> No, in any version of Ceph.
>>>>>
>>>>> Long answer:
>>>>> There are currently no checksums generated by Ceph and present to
>>>>> facilitate this.
>>>>
>>>> Yes but if you have a replication count of 3 ceph pg repair was always
>>>> working for me since bobtail. I've never seen corrupted data.
>>>>
>>> That's good and lucky for you.
>>>
>>> Not seeing corrupted data also doesn't mean there wasn't any
>>> corruption, it could simply mean that the data in question wasn't used
>>> or overwritten before being read again.
>>
>> Sure that's correct ;-) It just has happened so often that i thought
>> this could not always be the case. We had a lot of kernel crashes the
>> last month regarding XFS and bcache.
>>
> That's something slightly different than silent data corruption, but I
> can't really comment on this.

ah OK - no i was not talking about silent data corruption.

> 
>>> In the handful of scrub errors I ever encountered there was one case
>>> where blindly doing a repair from the primary PG would have been the
>>> wrong thing to do.
>>
>> Are your sure it really simply uses the primary pg? i always thought it
>> compares the sizes of the object and date with a replication factor 3.
>>
> Yes, I'm sure:
> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg10182.html
> 
> Sage also replied in a similar vein to you about this 4 years ago:
> http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/11575
> "In general we don't repair automatically lest we inadvertantly propagate
> bad data or paper over a bug."
> 
> And finally the Bluestore tech talk from last week from 35:40.
> https://www.youtube.com/watch?v=kuacS4jw5pM

Thanks!

> 
> Christian
>>
>>>>> If you'd run BTRFS or ZFS with filestore you'd be closer to an
>>>>> automatic state of affairs, as these filesystems do strong checksums
>>>>> and check them on reads and would create an immediate I/O error if
>>>>> something got corrupted, thus making it clear which OSD is in need of
>>>>> the hammer of healing.
>>>>
>>>> Yes but at least BTRFS is still not working for ceph due to
>>>> fragmentation. I've even tested a 4.6 kernel a few weeks ago. But it
>>>> doubles it's I/O after a few days.
>>>>
>>> Nobody (well, certainly not me) suggested to use BTRFS, especially with
>>> Bluestore "around the corner".
>>>
>>> Just pointing out that it has the necessary checksumming features.
>>
>> Sure. sorry.
>>
>> Stefan
>>
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com