On 17/12/2024 17:01, Janek Bevendorff wrote:
I checked the ceph osd dump json-pretty output and validated it with a little Python script. Turns out, there's this somewhere around line 1200:"read_balance": { "score_acting": inf, "score_stable": inf, "optimal_score": 0, "raw_score_acting": 3, "raw_score_stable": 3, "primary_affinity_weighted": 0.9999845027923584, "average_primary_affinity": 1, "average_primary_affinity_weighted": 1 }The inf values seem to be the problem. These are the only two invalid JSON values in the whole file. Do you happen to know how I can debug/fix this?On 17/12/2024 16:17, Janek Bevendorff wrote:Thanks. I tried running the command (dry run for now), but something's not working as expected. Have you ever seen this?$ /root/go/bin/pgremapper cancel-backfill --verbose ** executing: ceph osd dump -f json panic: invalid character 'i' looking for beginning of value goroutine 1 [running]:main.mustParseCephCommand({0xc000b00000?, 0x0?}, {0x0?, 0x0?}, {0x59c9c0?, 0xc00011af30?}) /root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/ceph.go:743 +0xe6main.osdDump()/root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/ceph.go:517 +0x53main.mustGetCurrentMappingState()/root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/mappingstate.go:54 +0x1dmain.glob..func9(0x73f540?, {0x5dabc9?, 0x1?, 0x1?})/root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/main.go:133 +0x1ef github.com/spf13/cobra.(*Command).execute(0x73f540, {0xc0001188a0, 0x1, 0x1})/root/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856 +0x663 github.com/spf13/cobra.(*Command).ExecuteC(0x73f040) /root/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960 +0x39c github.com/spf13/cobra.(*Command).Execute(...) /root/go/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897 main.main()/root/go/pkg/mod/github.com/digitalocean/pgremapper@v0.0.0-20240313130618-268522c0f6d5/main.go:740 +0x25Somehow it's choking here while trying to dumping OSDs: https://github.com/digitalocean/pgremapper/blob/main/ceph.go#L741There isn't an issue report about this. On 17/12/2024 15:59, Janne Johansson wrote:You can use pg-remapper (https://github.com/digitalocean/pgremapper) orsimilar tools to cancel the remapping; up-map entries will be created that reflect the current state of the cluster. After all currentlyrunning backfills are finished your mons should not be blocked anymore. I would also disable the balancer temporarily since it will trigger newbackfills for those PG that are not at their optimal locations. After mons are fine again you can just enable the balancer. This requires a ceph release and ceph clients with up-map support. Not tested in real life, but this approach might work.We use that approach at times, just so that there isn't a long long queue of PGs in the remapped state, and as far as I can tell, it is quite safe, You just programmatically tell each PG that there is an upmap entry for it telling it to be exactly where it is now, and then it isn't "misplaced" anymore. When you enable the balancer it will take a percentage of these and just remove their individual upmap entry, and they start to move as needed. If you want to have a small movement, set the max balancer to a really low value, and few PGs will be moving at the same time. If your wpq/mclock settings work ok for you, you can have a large percentage and let the IO scheduler prioritize for you. But as Burkhard says, setting "norebalance" for a moment, having the balancer disabled and then running one of these tools once or twice will make all PGs active+clean where they are, even if that isn't the desired end location for them. This should help your mons a lot, then enable the balancer and unset "norebalance" and let it finish the last PGs you have in the wrong spot. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
-- Bauhaus-Universität Weimar Bauhausstr. 9a, R308 99423 Weimar, Germany Phone: +49 3643 58 3577 www.webis.de
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx