Re: Contionuous spurious repairs without cause?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

interesting, that’s something we can definitely try!

Thanks!

Christian

> On 5. Sep 2023, at 16:37, Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:
> 
> Hi,
> 
> in older versions of ceph with the auto-repair feature the PG state of
> scrubbing PGs had always the repair state as well.
> With later versions (I don't know exactly at which version) ceph
> differentiated scrubbing and repair again in the PG state.
> 
> I think as long as there are no errors loged all should be fine. If
> you disable auto repair, the issue should disapear as well. In case of
> scrub errors you will then see appropriate states. 
> 
> Regards
> Manuel
> 
> On Tue, 05 Sep 2023 14:14:56 +0000
> Eugen Block <eblock@xxxxxx> wrote:
> 
>> Hi,
>> 
>> it sounds like you have auto-repair enabled (osd_scrub_auto_repair). I  
>> guess you could disable that to see what's going on with the PGs and  
>> their replicas. And/or you could enable debug logs. Are all daemons  
>> running the same ceph (minor) version? I remember a customer case  
>> where different ceph minor versions (but overall Octopus) caused  
>> damaged PGs, a repair fixed them everytime. After they updated all  
>> daemons to the same minor version those errors were gone.
>> 
>> Regards,
>> Eugen
>> 
>> Zitat von Christian Theune <ct@xxxxxxxxxxxxxxx>:
>> 
>>> Hi,
>>> 
>>> this is a bit older cluster (Nautilus, bluestore only).
>>> 
>>> We’ve noticed that the cluster is almost continuously repairing PGs.  
>>> However, they all finish successfully with “0 fixed”. We do not see  
>>> the trigger why Ceph decides to repair the PGs and it’s happening  
>>> for a lot of PGs, not any specific individual one.
>>> 
>>> Deep-scrubs are generally running, but currently a bit late as we  
>>> had some recoveries in the last week.
>>> 
>>> Logs look regular aside from the number of repairs. Here’s the last  
>>> weeks from the perspective of a single PG. There’s one repair, but  
>>> the same thing seems to happen for all PGs.
>>> 
>>> 2023-08-06 16:08:17.870 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-06 16:08:18.270 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-07 21:52:22.299 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-07 21:52:22.711 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-09 00:33:42.587 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-09 00:33:43.049 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-10 09:36:00.590 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub starts
>>> 2023-08-10 09:36:28.811 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub ok
>>> 2023-08-11 12:59:14.219 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-11 12:59:14.567 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-12 13:52:44.073 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-12 13:52:44.483 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-14 01:51:04.774 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub starts
>>> 2023-08-14 01:51:33.113 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub ok
>>> 2023-08-15 05:18:16.093 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-15 05:18:16.520 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-16 09:47:38.520 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-16 09:47:38.930 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-17 19:25:45.352 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-17 19:25:45.775 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-19 05:40:43.663 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-19 05:40:44.073 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-20 12:06:54.343 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-20 12:06:54.809 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-21 19:23:10.801 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub starts
>>> 2023-08-21 19:23:39.936 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub ok
>>> 2023-08-23 03:43:21.391 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-23 03:43:21.844 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-24 04:21:17.004 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub starts
>>> 2023-08-24 04:21:47.972 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub ok
>>> 2023-08-25 06:55:13.588 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-25 06:55:14.087 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-26 09:26:01.174 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-26 09:26:01.561 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-27 11:18:10.828 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-27 11:18:11.264 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-28 19:05:42.104 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-28 19:05:42.693 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-30 07:03:10.327 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-08-30 07:03:10.805 7fc49f1e6640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-08-31 14:43:23.849 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub starts
>>> 2023-08-31 14:43:50.723 7fc49b1de640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 deep-scrub ok
>>> 2023-09-01 20:53:42.749 7f37ca268640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-09-01 20:53:43.389 7f37c6260640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-09-02 22:57:49.542 7f37ca268640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-09-02 22:57:50.065 7f37c6260640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-09-04 03:16:14.754 7f37ca268640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub starts
>>> 2023-09-04 03:16:15.295 7f37ca268640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 scrub ok
>>> 2023-09-05 14:50:36.064 7f37ca268640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 repair starts
>>> 2023-09-05 14:51:04.407 7f37c6260640  0 log_channel(cluster) log  
>>> [DBG] : 278.2f3 repair ok, 0 fixed
>>> 
>>> Googling didn’t help, unfortunately and the bug tracker doesn’t  
>>> appear to have any relevant issue either.
>>> 
>>> Any ideas?
>>> 
>>> Liebe Grüße,
>>> Christian Theune
>>> 
>>> --
>>> Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian  
>>> Zagrodnick
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx  
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

Liebe Grüße,
Christian Theune

-- 
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux