Re: recurring stat mismatch on PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, primary OSD. Extracted with grep -e scrub -e repair -e 19.1fff /var/log/ceph/ceph-osd.338.log and then only relevant lines copied.

Yes, according to the case I should just run a deep-scrub and should see. I guess if this error was cleared on an aborted repair, this would be a new bug? I will do a deep-scrub and report back.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dvanders@xxxxxxxxx>
Sent: 08 October 2022 11:18:37
To: Frank Schilder
Cc: Ceph Users
Subject: Re:  recurring stat mismatch on PG

Is that the log from the primary OSD?

About the restart, you should probably just deep-scrub again to see the current state.


.. Dan



On Sat, Oct 8, 2022, 11:14 Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
Hi Dan,

yes, 15.2.17. I remember that case and was expecting it to be fixed. Here a relevant extract from the log:

2022-10-08T10:06:22.206+0200 7fa3c48c7700  0 log_channel(cluster) log [DBG] : 19.1fff deep-scrub starts
2022-10-08T10:22:33.049+0200 7fa3c48c7700 -1 log_channel(cluster) log [ERR] : 19.1fffs0 deep-scrub : stat mismatch, got 64532/64531 objects, 1243/1243 clones, 64532/64531 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 1215/1215 whiteouts, 170978253582/170974059278 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2022-10-08T10:22:33.049+0200 7fa3c48c7700 -1 log_channel(cluster) log [ERR] : 19.1fff deep-scrub 1 errors
2022-10-08T10:38:20.618+0200 7fa3c48c7700  0 log_channel(cluster) log [DBG] : 19.1fff repair starts
2022-10-08T10:54:25.801+0200 7fa3c48c7700 -1 log_channel(cluster) log [ERR] : 19.1fffs0 repair : stat mismatch, got 64532/64531 objects, 1243/1243 clones, 64532/64531 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 1215/1215 whiteouts, 170978253582/170974059278 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2022-10-08T10:54:25.802+0200 7fa3c48c7700 -1 log_channel(cluster) log [ERR] : 19.1fff repair 1 errors, 1 fixed

Just completed a repair and its gone for now. As an alternative explanation, we had this scrub error, I started a repair but then OSDs in that PG were shut down and restarted. Is it possible that the repair was cancelled and the error cleared erroneously?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dvanders@xxxxxxxxx<mailto:dvanders@xxxxxxxxx>>
Sent: 08 October 2022 11:03:05
To: Frank Schilder
Cc: Ceph Users
Subject: Re:  recurring stat mismatch on PG

Hi,

Is that 15.2.17? It reminds me of this bug - https://tracker.ceph.com/issues/52705 - where an object with a particular name would hash to ffffffff and cause a stat mismatch during scrub. But 15.2.17 should have the fix for that.


Can you find the relevant osd log for more info?

.. Dan



On Sat, Oct 8, 2022, 10:42 Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx><mailto:frans@xxxxxx<mailto:frans@xxxxxx>>> wrote:
Hi all,

I seem to observe something strange on an octopus(latest) cluster. We have a PG with a stat mismatch:

2022-10-08T10:06:22.206+0200 7fa3c48c7700  0 log_channel(cluster) log [DBG] : 19.1fff deep-scrub starts
2022-10-08T10:22:33.049+0200 7fa3c48c7700 -1 log_channel(cluster) log [ERR] : 19.1fffs0 deep-scrub : stat mismatch, got 64532/64531 objects, 1243/1243 clones, 64532/64531 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 1215/1215 whiteouts, 170978253582/170974059278 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2022-10-08T10:22:33.049+0200 7fa3c48c7700 -1 log_channel(cluster) log [ERR] : 19.1fff deep-scrub 1 errors

This exact same mismatch was found before and I executed a pg-repair that fixed it. Now its back. Does anyone have an idea why this might be happening and how to deal with it?

Thanks!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux