Re: Scrub stuck and 'pg has invalid (post-split) stat'

Cedric <yipikai7@xxxxxxxxx> · Thu, 22 Feb 2024 12:09:46 +0100

Yes the osd_scrub_invalid_stats is set to true.

We are thinking about the use of "ceph pg_mark_unfound_lost revert"
action, but we wonder if there is a risk of data loss.

On Thu, Feb 22, 2024 at 11:50 AM Eugen Block <eblock@xxxxxx> wrote:
>
> I found a config to force scrub invalid PGs, what is your current
> setting on that?
>
> ceph config get osd osd_scrub_invalid_stats
> true
>
> The config reference states:
>
> > Forces extra scrub to fix stats marked as invalid.
>
> But the default seems to be true, so I'd expect it's true in your case
> as well?
>
> Zitat von Cedric <yipikai7@xxxxxxxxx>:
>
> > Thanks Eugen for the suggestion, yes we have tried, also repeering
> > concerned PGs, still the same issue.
> >
> > Looking at the code it seems the split-mode message is triggered when
> > the PG as ""stats_invalid": true,", here is the result of a query:
> >
> > "stats_invalid": true,
> >                 "dirty_stats_invalid": false,
> >                 "omap_stats_invalid": false,
> >                 "hitset_stats_invalid": false,
> >                 "hitset_bytes_stats_invalid": false,
> >                 "pin_stats_invalid": false,
> >                 "manifest_stats_invalid": false,
> >
> > I also provide again cluster informations that was lost in previous
> > missed reply all. Don't hesitate to ask more if needed I would be
> > glade to provide them.
> >
> > Cédric
> >
> >
> > On Thu, Feb 22, 2024 at 11:04 AM Eugen Block <eblock@xxxxxx> wrote:
> >>
> >> Hm, I wonder if setting (and unsetting after a while) noscrub and
> >> nodeep-scrub has any effect. Have you tried that?
> >>
> >> Zitat von Cedric <yipikai7@xxxxxxxxx>:
> >>
> >> > Update: we have run fsck and re-shard on all bluestore volume, seems
> >> > sharding were not applied.
> >> >
> >> > Unfortunately scrubs and deep-scrubs are still stuck on PGs of the
> >> > pool that is suffering the issue, but other PGs scrubs well.
> >> >
> >> > The next step will be to remove the cache tier as suggested, but its
> >> > not available yet as PGs needs to be scrubbed in order for the cache
> >> > tier can be activated.
> >> >
> >> > As we are struggling to make this cluster works again, any help
> >> > would be greatly appreciated.
> >> >
> >> > Cédric
> >> >
> >> >> On 20 Feb 2024, at 20:22, Cedric <yipikai7@xxxxxxxxx> wrote:
> >> >>
> >> >> Thanks Eugen, sorry about the missed reply to all.
> >> >>
> >> >> The reason we still have the cache tier is because we were not able
> >> >> to flush all dirty entry to remove it (as per the procedure), so
> >> >> the cluster as been migrated from HDD/SSD to NVME a while ago but
> >> >> tiering remains, unfortunately.
> >> >>
> >> >> So actually we are trying to understand the root cause
> >> >>
> >> >> On Tue, Feb 20, 2024 at 1:43 PM Eugen Block <eblock@xxxxxx> wrote:
> >> >>>
> >> >>> Please don't drop the list from your response.
> >> >>>
> >> >>> The first question coming to mind is, why do you have a cache-tier if
> >> >>> all your pools are on nvme decices anyway? I don't see any benefit here.
> >> >>> Did you try the suggested workaround and disable the cache-tier?
> >> >>>
> >> >>> Zitat von Cedric <yipikai7@xxxxxxxxx>:
> >> >>>
> >> >>>> Thanks Eugen, see attached infos.
> >> >>>>
> >> >>>> Some more details:
> >> >>>>
> >> >>>> - commands that actually hangs: ceph balancer status ; rbd -p vms ls ;
> >> >>>> rados -p vms_cache cache-flush-evict-all
> >> >>>> - all scrub running on vms_caches pgs are stall / start in a loop
> >> >>>> without actually doing anything
> >> >>>> - all io are 0 both from ceph status or iostat on nodes
> >> >>>>
> >> >>>> On Tue, Feb 20, 2024 at 10:00 AM Eugen Block <eblock@xxxxxx> wrote:
> >> >>>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> some more details would be helpful, for example what's the pool size
> >> >>>>> of the cache pool? Did you issue a PG split before or during the
> >> >>>>> upgrade? This thread [1] deals with the same problem, the described
> >> >>>>> workaround was to set hit_set_count to 0 and disable the cache layer
> >> >>>>> until that is resolved. Afterwards you could enable the cache layer
> >> >>>>> again. But keep in mind that the code for cache tier is entirely
> >> >>>>> removed in Reef (IIRC).
> >> >>>>>
> >> >>>>> Regards,
> >> >>>>> Eugen
> >> >>>>>
> >> >>>>> [1]
> >> >>>>>
> >> https://ceph-users.ceph.narkive.com/zChyOq5D/ceph-strange-issue-after-adding-a-cache-osd
> >> >>>>>
> >> >>>>> Zitat von Cedric <yipikai7@xxxxxxxxx>:
> >> >>>>>
> >> >>>>>> Hello,
> >> >>>>>>
> >> >>>>>> Following an upgrade from Nautilus (14.2.22) to Pacific (16.2.13), we
> >> >>>>>> encounter an issue with a cache pool becoming completely stuck,
> >> >>>>>> relevant messages below:
> >> >>>>>>
> >> >>>>>> pg xx.x has invalid (post-split) stats; must scrub before tier agent
> >> >>>>>> can activate
> >> >>>>>>
> >> >>>>>> In OSD logs, scrubs are starting in a loop without succeeding for all
> >> >>>>>> pg of this pool.
> >> >>>>>>
> >> >>>>>> What we already tried without luck so far:
> >> >>>>>>
> >> >>>>>> - shutdown / restart OSD
> >> >>>>>> - rebalance pg between OSD
> >> >>>>>> - raise the memory on OSD
> >> >>>>>> - repeer PG
> >> >>>>>>
> >> >>>>>> Any idea what is causing this? any help will be greatly appreciated
> >> >>>>>>
> >> >>>>>> Thanks
> >> >>>>>>
> >> >>>>>> Cédric
> >> >>>>>> _______________________________________________
> >> >>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >> >>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> >>>>>
> >> >>>>>
> >> >>>>> _______________________________________________
> >> >>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >> >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >> >>>
> >> >>>
> >> >>>
> >>
> >>
> >>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx