Re: Scrub stuck and 'pg has invalid (post-split) stat'

Cedric <yipikai7@xxxxxxxxx> · Thu, 22 Feb 2024 14:12:14 +0100



On Thu, Feb 22, 2024 at 12:37 PM Eugen Block <eblock@xxxxxx> wrote:
> You haven't told yet if you changed the hit_set_count to 0.

Not yet, we will give it a try ASAP

> Have you already tried to set the primary PG out and wait for the
> backfill to finish?

No, we will try also

> And another question, are all services running pacific already and on
> the same version (ceph versions)?

Yes, all daemon runs 16.2.13

>
> Zitat von Cedric <yipikai7@xxxxxxxxx>:
>
> > Yes the osd_scrub_invalid_stats is set to true.
> >
> > We are thinking about the use of "ceph pg_mark_unfound_lost revert"
> > action, but we wonder if there is a risk of data loss.
> >
> > On Thu, Feb 22, 2024 at 11:50 AM Eugen Block <eblock@xxxxxx> wrote:
> >>
> >> I found a config to force scrub invalid PGs, what is your current
> >> setting on that?
> >>
> >> ceph config get osd osd_scrub_invalid_stats
> >> true
> >>
> >> The config reference states:
> >>
> >> > Forces extra scrub to fix stats marked as invalid.
> >>
> >> But the default seems to be true, so I'd expect it's true in your case
> >> as well?
> >>
> >> Zitat von Cedric <yipikai7@xxxxxxxxx>:
> >>
> >> > Thanks Eugen for the suggestion, yes we have tried, also repeering
> >> > concerned PGs, still the same issue.
> >> >
> >> > Looking at the code it seems the split-mode message is triggered when
> >> > the PG as ""stats_invalid": true,", here is the result of a query:
> >> >
> >> > "stats_invalid": true,
> >> >                 "dirty_stats_invalid": false,
> >> >                 "omap_stats_invalid": false,
> >> >                 "hitset_stats_invalid": false,
> >> >                 "hitset_bytes_stats_invalid": false,
> >> >                 "pin_stats_invalid": false,
> >> >                 "manifest_stats_invalid": false,
> >> >
> >> > I also provide again cluster informations that was lost in previous
> >> > missed reply all. Don't hesitate to ask more if needed I would be
> >> > glade to provide them.
> >> >
> >> > Cédric
> >> >
> >> >
> >> > On Thu, Feb 22, 2024 at 11:04 AM Eugen Block <eblock@xxxxxx> wrote:
> >> >>
> >> >> Hm, I wonder if setting (and unsetting after a while) noscrub and
> >> >> nodeep-scrub has any effect. Have you tried that?
> >> >>
> >> >> Zitat von Cedric <yipikai7@xxxxxxxxx>:
> >> >>
> >> >> > Update: we have run fsck and re-shard on all bluestore volume, seems
> >> >> > sharding were not applied.
> >> >> >
> >> >> > Unfortunately scrubs and deep-scrubs are still stuck on PGs of the
> >> >> > pool that is suffering the issue, but other PGs scrubs well.
> >> >> >
> >> >> > The next step will be to remove the cache tier as suggested, but its
> >> >> > not available yet as PGs needs to be scrubbed in order for the cache
> >> >> > tier can be activated.
> >> >> >
> >> >> > As we are struggling to make this cluster works again, any help
> >> >> > would be greatly appreciated.
> >> >> >
> >> >> > Cédric
> >> >> >
> >> >> >> On 20 Feb 2024, at 20:22, Cedric <yipikai7@xxxxxxxxx> wrote:
> >> >> >>
> >> >> >> Thanks Eugen, sorry about the missed reply to all.
> >> >> >>
> >> >> >> The reason we still have the cache tier is because we were not able
> >> >> >> to flush all dirty entry to remove it (as per the procedure), so
> >> >> >> the cluster as been migrated from HDD/SSD to NVME a while ago but
> >> >> >> tiering remains, unfortunately.
> >> >> >>
> >> >> >> So actually we are trying to understand the root cause
> >> >> >>
> >> >> >> On Tue, Feb 20, 2024 at 1:43 PM Eugen Block <eblock@xxxxxx> wrote:
> >> >> >>>
> >> >> >>> Please don't drop the list from your response.
> >> >> >>>
> >> >> >>> The first question coming to mind is, why do you have a cache-tier if
> >> >> >>> all your pools are on nvme decices anyway? I don't see any
> >> benefit here.
> >> >> >>> Did you try the suggested workaround and disable the cache-tier?
> >> >> >>>
> >> >> >>> Zitat von Cedric <yipikai7@xxxxxxxxx>:
> >> >> >>>
> >> >> >>>> Thanks Eugen, see attached infos.
> >> >> >>>>
> >> >> >>>> Some more details:
> >> >> >>>>
> >> >> >>>> - commands that actually hangs: ceph balancer status ; rbd
> >> -p vms ls ;
> >> >> >>>> rados -p vms_cache cache-flush-evict-all
> >> >> >>>> - all scrub running on vms_caches pgs are stall / start in a loop
> >> >> >>>> without actually doing anything
> >> >> >>>> - all io are 0 both from ceph status or iostat on nodes
> >> >> >>>>
> >> >> >>>> On Tue, Feb 20, 2024 at 10:00 AM Eugen Block <eblock@xxxxxx> wrote:
> >> >> >>>>>
> >> >> >>>>> Hi,
> >> >> >>>>>
> >> >> >>>>> some more details would be helpful, for example what's the
> >> pool size
> >> >> >>>>> of the cache pool? Did you issue a PG split before or during the
> >> >> >>>>> upgrade? This thread [1] deals with the same problem, the described
> >> >> >>>>> workaround was to set hit_set_count to 0 and disable the
> >> cache layer
> >> >> >>>>> until that is resolved. Afterwards you could enable the cache layer
> >> >> >>>>> again. But keep in mind that the code for cache tier is entirely
> >> >> >>>>> removed in Reef (IIRC).
> >> >> >>>>>
> >> >> >>>>> Regards,
> >> >> >>>>> Eugen
> >> >> >>>>>
> >> >> >>>>> [1]
> >> >> >>>>>
> >> >>
> >> https://ceph-users.ceph.narkive.com/zChyOq5D/ceph-strange-issue-after-adding-a-cache-osd
> >> >> >>>>>
> >> >> >>>>> Zitat von Cedric <yipikai7@xxxxxxxxx>:
> >> >> >>>>>
> >> >> >>>>>> Hello,
> >> >> >>>>>>
> >> >> >>>>>> Following an upgrade from Nautilus (14.2.22) to Pacific
> >> (16.2.13), we
> >> >> >>>>>> encounter an issue with a cache pool becoming completely stuck,
> >> >> >>>>>> relevant messages below:
> >> >> >>>>>>
> >> >> >>>>>> pg xx.x has invalid (post-split) stats; must scrub before
> >> tier agent
> >> >> >>>>>> can activate
> >> >> >>>>>>
> >> >> >>>>>> In OSD logs, scrubs are starting in a loop without
> >> succeeding for all
> >> >> >>>>>> pg of this pool.
> >> >> >>>>>>
> >> >> >>>>>> What we already tried without luck so far:
> >> >> >>>>>>
> >> >> >>>>>> - shutdown / restart OSD
> >> >> >>>>>> - rebalance pg between OSD
> >> >> >>>>>> - raise the memory on OSD
> >> >> >>>>>> - repeer PG
> >> >> >>>>>>
> >> >> >>>>>> Any idea what is causing this? any help will be greatly
> >> appreciated
> >> >> >>>>>>
> >> >> >>>>>> Thanks
> >> >> >>>>>>
> >> >> >>>>>> Cédric
> >> >> >>>>>> _______________________________________________
> >> >> >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >> >> >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> _______________________________________________
> >> >> >>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >> >> >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx