How "wide" is "wide". I have 4 nodes and 140 HDD OSDs. Here is the info as from the Ceph system: # ceph osd erasure-code-profile get hdd_ec crush-device-class=hdd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure technique=reed_sol_van w=8 Here is what your script gives: # python tools/ceph-pool-pg-distribution 3 Searching for PGs in pools: ['3'] Summary: 2048 PGs on 140 osds Num OSDs with X PGs: 43: 16 44: 124 ... and finally your last proposal, so it looks like I have some left-over pgs, see below. I'm also observing PG values than 43/44 on other osds in the system. # ceph daemon osd.0 status { "cluster_fsid": "55633ec3-6c0c-4a02-990c-0f87e0f7a01f", "osd_fsid": "85e266f1-8d8c-4c2a-b03c-0aef9bc4e532", "whoami": 0, "state": "active", "oldest_map": 99775, "newest_map": 281713, "num_pgs": 77 } I found this ticket (https://tracker.ceph.com/issues/38931 I believe you actually opened it ;-)) and tried to restart the osd.0 and now the OSD is scrubbing some of its pgs... However, I'm uncertain that this is actually trimming the left-over pgs. Thanks for all your help up to this point already! Best wishes, Manuel On Wed, Jul 28, 2021 at 9:55 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > How wide is hdd_ec? With a wide EC rule and relatively few OSDs and > relatively few PGs per OSD for the pool, it can be impossible for the > balancer to make things perfect. > It would help to look at the PG distribution for only the hdd_ec pool > -- this script can help > > https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-pool-pg-distribution > > Another possibility is that osd.0 has some leftover data from PGs that > should have been deleted. From the box, check: `ceph daemon osd.0 > status` and compare the number of PGs it holds vs the value in your > osd df output (48). > > -- dan > > On Wed, Jul 28, 2021 at 9:24 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> > wrote: > > > > Hi, > > > > thanks for your quick response. I already did this earlier this week: > > > > # ceph config dump | grep upmap_max_deviation > > mgr advanced mgr/balancer/upmap_max_deviation > 1 > > > > Cheers, > > Manuel > > > > On Wed, Jul 28, 2021 at 9:15 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> > wrote: > >> > >> Hi, > >> > >> Start by setting: > >> > >> ceph config set mgr mgr/balancer/upmap_max_deviation 1 > >> > >> This configures the balancer to squeeze the OSDs to within 1 PG of > eachother. > >> > >> I'm starting to think this should be the default. > >> > >> Cheers, dan > >> > >> > >> On Wed, Jul 28, 2021 at 9:08 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> > wrote: > >> > > >> > Dear all, > >> > > >> > I'm running Ceph 14.2.11. I have 140 HDDs in my cluster of 4 nodes, > 35 HDDs > >> > per node. I am observing fill ratios of 66% to 70% of OSDs and then > one > >> > with 82% (see attached ceph-osd-df.txt for output of "ceph osd df"). > >> > > >> > Previously, I had problems with single OSDs filling up to 85% and then > >> > everything coming to a grinding halt. Ideally, I would like to have > all OSD > >> > fill grade to be close to the mean of 67%... At the very least I need > to > >> > get the 82% OSD back into the range. > >> > > >> > I have upmap balancing enabled and the balancer says: > >> > > >> > # ceph balancer status > >> > { > >> > "last_optimize_duration": "0:00:00.053686", > >> > "plans": [], > >> > "mode": "upmap", > >> > "active": true, > >> > "optimize_result": "Unable to find further optimization, or > pool(s) > >> > pg_num is decreasing, or distribution is already perfect", > >> > "last_optimize_started": "Wed Jul 28 09:03:02 2021" > >> > } > >> > > >> > Creating an offline balancing plan looks like this: > >> > > >> > # ceph osd getmap -o om > >> > got osdmap epoch 281708 > >> > # osdmaptool om --upmap out.txt --upmap-pool hdd_ec --upmap-deviation > 1 > >> > --upmap-active > >> > osdmaptool: osdmap file 'om' > >> > writing upmap command output to: out.txt > >> > checking for upmap cleanups > >> > upmap, max-count 10, max deviation 1 > >> > limiting to pools hdd_ec ([3]) > >> > pools hdd_ec > >> > prepared 0/10 changes > >> > Time elapsed 0.0275739 secs > >> > Unable to find further optimization, or distribution is already > perfect > >> > osd.0 pgs 43 > >> > [...] > >> > # wc -l out.txt > >> > 0 out.txt > >> > > >> > Does anyone have a suggestion how to proceed getting the 82% OSD > closer to > >> > the mean fill ratio (and maybe the other OSDs as well?) > >> > > >> > Thanks, > >> > Manuel > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx