Wait, after re-reading my own ticket I realized you can more easily remove the leftover PGs by re-peering the *other* osds. "I found a way to remove those leftover PGs (without using ceph-objectstore-tool): If the PG re-peers, then osd.74 notices he's not in the up/acting set then starts deleting the PG. So at the moment I'm restarting those former peers to trim this OSD." -- dan On Wed, Jul 28, 2021 at 10:37 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > Cool, looks like the second problem is the real issue here :) > > IIRC, you can remove the leftover PGs with ceph-objectstore-tool. I > don't recall the exact syntax, but you'd need to find out which PGs > are not mapped there by the current crush rule and remove the others. > Or, you can zap and re-create the OSD. > > -- dan > > > On Wed, Jul 28, 2021 at 10:34 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote: > > > > How "wide" is "wide". I have 4 nodes and 140 HDD OSDs. Here is the info as from the Ceph system: > > > > # ceph osd erasure-code-profile get hdd_ec > > crush-device-class=hdd > > crush-failure-domain=host > > crush-root=default > > jerasure-per-chunk-alignment=false > > k=2 > > m=1 > > plugin=jerasure > > technique=reed_sol_van > > w=8 > > > > Here is what your script gives: > > > > # python tools/ceph-pool-pg-distribution 3 > > Searching for PGs in pools: ['3'] > > Summary: 2048 PGs on 140 osds > > > > Num OSDs with X PGs: > > 43: 16 > > 44: 124 > > > > ... and finally your last proposal, so it looks like I have some left-over pgs, see below. I'm also observing PG values than 43/44 on other osds in the system. > > > > # ceph daemon osd.0 status > > { > > "cluster_fsid": "55633ec3-6c0c-4a02-990c-0f87e0f7a01f", > > "osd_fsid": "85e266f1-8d8c-4c2a-b03c-0aef9bc4e532", > > "whoami": 0, > > "state": "active", > > "oldest_map": 99775, > > "newest_map": 281713, > > "num_pgs": 77 > > } > > > > I found this ticket (https://tracker.ceph.com/issues/38931 I believe you actually opened it ;-)) and tried to restart the osd.0 and now the OSD is scrubbing some of its pgs... However, I'm uncertain that this is actually trimming the left-over pgs. > > > > Thanks for all your help up to this point already! > > > > Best wishes, > > Manuel > > > > On Wed, Jul 28, 2021 at 9:55 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > >> > >> How wide is hdd_ec? With a wide EC rule and relatively few OSDs and > >> relatively few PGs per OSD for the pool, it can be impossible for the > >> balancer to make things perfect. > >> It would help to look at the PG distribution for only the hdd_ec pool > >> -- this script can help > >> https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-pool-pg-distribution > >> > >> Another possibility is that osd.0 has some leftover data from PGs that > >> should have been deleted. From the box, check: `ceph daemon osd.0 > >> status` and compare the number of PGs it holds vs the value in your > >> osd df output (48). > >> > >> -- dan > >> > >> On Wed, Jul 28, 2021 at 9:24 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote: > >> > > >> > Hi, > >> > > >> > thanks for your quick response. I already did this earlier this week: > >> > > >> > # ceph config dump | grep upmap_max_deviation > >> > mgr advanced mgr/balancer/upmap_max_deviation 1 > >> > > >> > Cheers, > >> > Manuel > >> > > >> > On Wed, Jul 28, 2021 at 9:15 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > >> >> > >> >> Hi, > >> >> > >> >> Start by setting: > >> >> > >> >> ceph config set mgr mgr/balancer/upmap_max_deviation 1 > >> >> > >> >> This configures the balancer to squeeze the OSDs to within 1 PG of eachother. > >> >> > >> >> I'm starting to think this should be the default. > >> >> > >> >> Cheers, dan > >> >> > >> >> > >> >> On Wed, Jul 28, 2021 at 9:08 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote: > >> >> > > >> >> > Dear all, > >> >> > > >> >> > I'm running Ceph 14.2.11. I have 140 HDDs in my cluster of 4 nodes, 35 HDDs > >> >> > per node. I am observing fill ratios of 66% to 70% of OSDs and then one > >> >> > with 82% (see attached ceph-osd-df.txt for output of "ceph osd df"). > >> >> > > >> >> > Previously, I had problems with single OSDs filling up to 85% and then > >> >> > everything coming to a grinding halt. Ideally, I would like to have all OSD > >> >> > fill grade to be close to the mean of 67%... At the very least I need to > >> >> > get the 82% OSD back into the range. > >> >> > > >> >> > I have upmap balancing enabled and the balancer says: > >> >> > > >> >> > # ceph balancer status > >> >> > { > >> >> > "last_optimize_duration": "0:00:00.053686", > >> >> > "plans": [], > >> >> > "mode": "upmap", > >> >> > "active": true, > >> >> > "optimize_result": "Unable to find further optimization, or pool(s) > >> >> > pg_num is decreasing, or distribution is already perfect", > >> >> > "last_optimize_started": "Wed Jul 28 09:03:02 2021" > >> >> > } > >> >> > > >> >> > Creating an offline balancing plan looks like this: > >> >> > > >> >> > # ceph osd getmap -o om > >> >> > got osdmap epoch 281708 > >> >> > # osdmaptool om --upmap out.txt --upmap-pool hdd_ec --upmap-deviation 1 > >> >> > --upmap-active > >> >> > osdmaptool: osdmap file 'om' > >> >> > writing upmap command output to: out.txt > >> >> > checking for upmap cleanups > >> >> > upmap, max-count 10, max deviation 1 > >> >> > limiting to pools hdd_ec ([3]) > >> >> > pools hdd_ec > >> >> > prepared 0/10 changes > >> >> > Time elapsed 0.0275739 secs > >> >> > Unable to find further optimization, or distribution is already perfect > >> >> > osd.0 pgs 43 > >> >> > [...] > >> >> > # wc -l out.txt > >> >> > 0 out.txt > >> >> > > >> >> > Does anyone have a suggestion how to proceed getting the 82% OSD closer to > >> >> > the mean fill ratio (and maybe the other OSDs as well?) > >> >> > > >> >> > Thanks, > >> >> > Manuel > >> >> > _______________________________________________ > >> >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx