Re: recurring stat mismatch on PG

Dan van der Ster <dvanders@xxxxxxxxx> · Sat, 8 Oct 2022 11:18:37 +0200

Is that the log from the primary OSD?

About the restart, you should probably just deep-scrub again to see the
current state.

.. Dan

On Sat, Oct 8, 2022, 11:14 Frank Schilder <frans@xxxxxx> wrote:

> Hi Dan,
>
> yes, 15.2.17. I remember that case and was expecting it to be fixed. Here
> a relevant extract from the log:
>
> 2022-10-08T10:06:22.206+0200 7fa3c48c7700  0 log_channel(cluster) log
> [DBG] : 19.1fff deep-scrub starts
> 2022-10-08T10:22:33.049+0200 7fa3c48c7700 -1 log_channel(cluster) log
> [ERR] : 19.1fffs0 deep-scrub : stat mismatch, got 64532/64531 objects,
> 1243/1243 clones, 64532/64531 dirty, 0/0 omap, 0/0 pinned, 0/0
> hit_set_archive, 1215/1215 whiteouts, 170978253582/170974059278 bytes, 0/0
> manifest objects, 0/0 hit_set_archive bytes.
> 2022-10-08T10:22:33.049+0200 7fa3c48c7700 -1 log_channel(cluster) log
> [ERR] : 19.1fff deep-scrub 1 errors
> 2022-10-08T10:38:20.618+0200 7fa3c48c7700  0 log_channel(cluster) log
> [DBG] : 19.1fff repair starts
> 2022-10-08T10:54:25.801+0200 7fa3c48c7700 -1 log_channel(cluster) log
> [ERR] : 19.1fffs0 repair : stat mismatch, got 64532/64531 objects,
> 1243/1243 clones, 64532/64531 dirty, 0/0 omap, 0/0 pinned, 0/0
> hit_set_archive, 1215/1215 whiteouts, 170978253582/170974059278 bytes, 0/0
> manifest objects, 0/0 hit_set_archive bytes.
> 2022-10-08T10:54:25.802+0200 7fa3c48c7700 -1 log_channel(cluster) log
> [ERR] : 19.1fff repair 1 errors, 1 fixed
>
> Just completed a repair and its gone for now. As an alternative
> explanation, we had this scrub error, I started a repair but then OSDs in
> that PG were shut down and restarted. Is it possible that the repair was
> cancelled and the error cleared erroneously?
>
> Thanks and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Dan van der Ster <dvanders@xxxxxxxxx>
> Sent: 08 October 2022 11:03:05
> To: Frank Schilder
> Cc: Ceph Users
> Subject: Re:  recurring stat mismatch on PG
>
> Hi,
>
> Is that 15.2.17? It reminds me of this bug -
> https://tracker.ceph.com/issues/52705 - where an object with a particular
> name would hash to ffffffff and cause a stat mismatch during scrub. But
> 15.2.17 should have the fix for that.
>
>
> Can you find the relevant osd log for more info?
>
> .. Dan
>
>
>
> On Sat, Oct 8, 2022, 10:42 Frank Schilder <frans@xxxxxx<mailto:
> frans@xxxxxx>> wrote:
> Hi all,
>
> I seem to observe something strange on an octopus(latest) cluster. We have
> a PG with a stat mismatch:
>
> 2022-10-08T10:06:22.206+0200 7fa3c48c7700  0 log_channel(cluster) log
> [DBG] : 19.1fff deep-scrub starts
> 2022-10-08T10:22:33.049+0200 7fa3c48c7700 -1 log_channel(cluster) log
> [ERR] : 19.1fffs0 deep-scrub : stat mismatch, got 64532/64531 objects,
> 1243/1243 clones, 64532/64531 dirty, 0/0 omap, 0/0 pinned, 0/0
> hit_set_archive, 1215/1215 whiteouts, 170978253582/170974059278 bytes, 0/0
> manifest objects, 0/0 hit_set_archive bytes.
> 2022-10-08T10:22:33.049+0200 7fa3c48c7700 -1 log_channel(cluster) log
> [ERR] : 19.1fff deep-scrub 1 errors
>
> This exact same mismatch was found before and I executed a pg-repair that
> fixed it. Now its back. Does anyone have an idea why this might be
> happening and how to deal with it?
>
> Thanks!
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:
> ceph-users-leave@xxxxxxx>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx