Re: MONs not trimming

Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> · Tue, 17 Dec 2024 17:37:24 +0100

Looks like there is something wrong with the .mgr pool. All other have 
proper values. For now I've patched the pgremapper source code to 
replace the inf values with 0 before unmarshaling the JSON. That at 
least made the tool work. I guess it's safe to just delete that pool and 
let the MGRs recreate it?? (is it?)

On 17/12/2024 17:01, Janek Bevendorff wrote:
I checked the ceph osd dump json-pretty output and validated it with a 
little Python script. Turns out, there's this somewhere around line 1200:

            "read_balance": {
                "score_acting": inf,
                "score_stable": inf,
                "optimal_score": 0,
                "raw_score_acting": 3,
                "raw_score_stable": 3,
                "primary_affinity_weighted": 0.9999845027923584,
                "average_primary_affinity": 1,
                "average_primary_affinity_weighted": 1
            }

The inf values seem to be the problem. These are the only two invalid 
JSON values in the whole file. Do you happen to know how I can 
debug/fix this?

On 17/12/2024 16:17, Janek Bevendorff wrote:
Thanks. I tried running the command (dry run for now), but 
something's not working as expected. Have you ever seen this?

$ /root/go/bin/pgremapper cancel-backfill --verbose
** executing: ceph osd dump -f json
panic: invalid character 'i' looking for beginning of value

goroutine 1 [running]:
main.mustParseCephCommand({0xc000b00000?, 0x0?}, {0x0?, 0x0?}, 
{0x59c9c0?, 0xc00011af30?})
/root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/ceph.go:743 
+0xe6
main.osdDump()
/root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/ceph.go:517 
+0x53
main.mustGetCurrentMappingState()
/root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/mappingstate.go:54 
+0x1d
main.glob..func9(0x73f540?, {0x5dabc9?, 0x1?, 0x1?})
/root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/main.go:133 
+0x1ef
github.com/spf13/cobra.(*Command).execute(0x73f540, {0xc0001188a0, 
0x1, 0x1})
/root/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856 +0x663
github.com/spf13/cobra.(*Command).ExecuteC(0x73f040)
/root/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960 +0x39c
github.com/spf13/cobra.(*Command).Execute(...)
/root/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897
main.main()
/root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/main.go:740 
+0x25

Somehow it's choking here while trying to dumping OSDs: 
https://github.com/digitalocean/pgremapper/blob/main/ceph.go#L741

There isn't an issue report about this.

On 17/12/2024 15:59, Janne Johansson wrote:
You can use pg-remapper 
(https://github.com/digitalocean/pgremapper) or
similar tools to cancel the remapping; up-map entries will be created
that reflect the current state of the cluster. After all currently
running backfills are finished your mons should not be blocked 
anymore.
I would also disable the balancer temporarily since it will trigger 
new
backfills for those PG that are not at their optimal locations. After
mons are fine again you can just enable the balancer. This requires a
ceph release and ceph clients with up-map support.
Not tested in real life, but this approach might work.
We use that approach at times, just so that there isn't a long long
queue of PGs in the remapped state,
and as far as I can tell, it is quite safe, You just programmatically
tell each PG that there is an upmap entry
for it telling it to be exactly where it is now, and then it isn't
"misplaced" anymore. When you enable the balancer
it will take a percentage of these and just remove their individual
upmap entry, and they start to move as needed.
If you want to have a small movement, set the max balancer to a really
low value, and few PGs will be moving at
the same time. If your wpq/mclock settings work ok for you, you can
have a large percentage and let the IO
scheduler prioritize for you. But as Burkhard says, setting
"norebalance" for a moment, having the balancer
disabled and then running one of these tools once or twice will make
all PGs active+clean where they are,
even if that isn't the desired end location for them. This should help
your mons a lot, then enable the balancer
and unset "norebalance" and let it finish the last PGs you have in the
wrong spot.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Bauhaus-Universität Weimar
Bauhausstr. 9a, R308
99423 Weimar, Germany

Phone: +49 3643 58 3577
www.webis.de

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx