Re: MONs not trimming

Joshua Baergen <jbaergen@xxxxxxxxxxxxxxxx> · Tue, 17 Dec 2024 09:39:12 -0700

Hey Janek,

Ah, yes, we ran into that invalid json output in
https://github.com/digitalocean/ceph_exporter as well. I have a patch
I wrote for ceph_exporter that I can port over to pgremapper (that
does similar to what your patch does).

Josh

On Tue, Dec 17, 2024 at 9:38 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
>
> Looks like there is something wrong with the .mgr pool. All other have
> proper values. For now I've patched the pgremapper source code to
> replace the inf values with 0 before unmarshaling the JSON. That at
> least made the tool work. I guess it's safe to just delete that pool and
> let the MGRs recreate it?? (is it?)
>
>
> On 17/12/2024 17:01, Janek Bevendorff wrote:
> > I checked the ceph osd dump json-pretty output and validated it with a
> > little Python script. Turns out, there's this somewhere around line 1200:
> >
> >             "read_balance": {
> >                 "score_acting": inf,
> >                 "score_stable": inf,
> >                 "optimal_score": 0,
> >                 "raw_score_acting": 3,
> >                 "raw_score_stable": 3,
> >                 "primary_affinity_weighted": 0.9999845027923584,
> >                 "average_primary_affinity": 1,
> >                 "average_primary_affinity_weighted": 1
> >             }
> >
> >
> > The inf values seem to be the problem. These are the only two invalid
> > JSON values in the whole file. Do you happen to know how I can
> > debug/fix this?
> >
> >
> > On 17/12/2024 16:17, Janek Bevendorff wrote:
> >> Thanks. I tried running the command (dry run for now), but
> >> something's not working as expected. Have you ever seen this?
> >>
> >> $ /root/go/bin/pgremapper cancel-backfill --verbose
> >> ** executing: ceph osd dump -f json
> >> panic: invalid character 'i' looking for beginning of value
> >>
> >> goroutine 1 [running]:
> >> main.mustParseCephCommand({0xc000b00000?, 0x0?}, {0x0?, 0x0?},
> >> {0x59c9c0?, 0xc00011af30?})
> >> /root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/ceph.go:743
> >> +0xe6
> >> main.osdDump()
> >> /root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/ceph.go:517
> >> +0x53
> >> main.mustGetCurrentMappingState()
> >> /root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/mappingstate.go:54
> >> +0x1d
> >> main.glob..func9(0x73f540?, {0x5dabc9?, 0x1?, 0x1?})
> >> /root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/main.go:133
> >> +0x1ef
> >> github.com/spf13/cobra.(*Command).execute(0x73f540, {0xc0001188a0,
> >> 0x1, 0x1})
> >> /root/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856 +0x663
> >> github.com/spf13/cobra.(*Command).ExecuteC(0x73f040)
> >> /root/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960 +0x39c
> >> github.com/spf13/cobra.(*Command).Execute(...)
> >> /root/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897
> >> main.main()
> >> /root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/main.go:740
> >> +0x25
> >>
> >>
> >> Somehow it's choking here while trying to dumping OSDs:
> >> https://github.com/digitalocean/pgremapper/blob/main/ceph.go#L741
> >>
> >> There isn't an issue report about this.
> >>
> >>
> >>
> >> On 17/12/2024 15:59, Janne Johansson wrote:
> >>>> You can use pg-remapper
> >>>> (https://github.com/digitalocean/pgremapper) or
> >>>> similar tools to cancel the remapping; up-map entries will be created
> >>>> that reflect the current state of the cluster. After all currently
> >>>> running backfills are finished your mons should not be blocked
> >>>> anymore.
> >>>> I would also disable the balancer temporarily since it will trigger
> >>>> new
> >>>> backfills for those PG that are not at their optimal locations. After
> >>>> mons are fine again you can just enable the balancer. This requires a
> >>>> ceph release and ceph clients with up-map support.
> >>>> Not tested in real life, but this approach might work.
> >>> We use that approach at times, just so that there isn't a long long
> >>> queue of PGs in the remapped state,
> >>> and as far as I can tell, it is quite safe, You just programmatically
> >>> tell each PG that there is an upmap entry
> >>> for it telling it to be exactly where it is now, and then it isn't
> >>> "misplaced" anymore. When you enable the balancer
> >>> it will take a percentage of these and just remove their individual
> >>> upmap entry, and they start to move as needed.
> >>> If you want to have a small movement, set the max balancer to a really
> >>> low value, and few PGs will be moving at
> >>> the same time. If your wpq/mclock settings work ok for you, you can
> >>> have a large percentage and let the IO
> >>> scheduler prioritize for you. But as Burkhard says, setting
> >>> "norebalance" for a moment, having the balancer
> >>> disabled and then running one of these tools once or twice will make
> >>> all PGs active+clean where they are,
> >>> even if that isn't the desired end location for them. This should help
> >>> your mons a lot, then enable the balancer
> >>> and unset "norebalance" and let it finish the last PGs you have in the
> >>> wrong spot.
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> --
> Bauhaus-Universität Weimar
> Bauhausstr. 9a, R308
> 99423 Weimar, Germany
>
> Phone: +49 3643 58 3577
> www.webis.de
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx