Re: Handling out-of-balance OSD?

Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> · Wed, 28 Jul 2021 10:34:25 +0200

How "wide" is "wide". I have 4 nodes and 140 HDD OSDs. Here is the info as
from the Ceph system:

# ceph osd erasure-code-profile get hdd_ec
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=1
plugin=jerasure
technique=reed_sol_van
w=8

Here is what your script gives:

# python tools/ceph-pool-pg-distribution 3
Searching for PGs in pools: ['3']
Summary: 2048 PGs on 140 osds

Num OSDs with X PGs:
43: 16
44: 124

... and finally your last proposal, so it looks like I have some left-over
pgs, see below. I'm also observing PG values than 43/44 on other osds in
the system.

# ceph daemon osd.0 status
{
    "cluster_fsid": "55633ec3-6c0c-4a02-990c-0f87e0f7a01f",
    "osd_fsid": "85e266f1-8d8c-4c2a-b03c-0aef9bc4e532",
    "whoami": 0,
    "state": "active",
    "oldest_map": 99775,
    "newest_map": 281713,
    "num_pgs": 77
}

I found this ticket (https://tracker.ceph.com/issues/38931 I believe you
actually opened it ;-)) and tried to restart the osd.0 and now the OSD is
scrubbing some of its pgs... However, I'm uncertain that this is actually
trimming the left-over pgs.

Thanks for all your help up to this point already!

Best wishes,
Manuel

On Wed, Jul 28, 2021 at 9:55 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:

> How wide is hdd_ec? With a wide EC rule and relatively few OSDs and
> relatively few PGs per OSD for the pool, it can be impossible for the
> balancer to make things perfect.
> It would help to look at the PG distribution for only the hdd_ec pool
> -- this script can help
>
> https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-pool-pg-distribution
>
> Another possibility is that osd.0 has some leftover data from PGs that
> should have been deleted. From the box, check: `ceph daemon osd.0
> status` and compare the number of PGs it holds vs the value in your
> osd df output (48).
>
> -- dan
>
> On Wed, Jul 28, 2021 at 9:24 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx>
> wrote:
> >
> > Hi,
> >
> > thanks for your quick response. I already did this earlier this week:
> >
> > # ceph config dump | grep upmap_max_deviation
> >   mgr       advanced mgr/balancer/upmap_max_deviation
> 1
> >
> > Cheers,
> > Manuel
> >
> > On Wed, Jul 28, 2021 at 9:15 AM Dan van der Ster <dan@xxxxxxxxxxxxxx>
> wrote:
> >>
> >> Hi,
> >>
> >> Start by setting:
> >>
> >>     ceph config set mgr mgr/balancer/upmap_max_deviation 1
> >>
> >> This configures the balancer to squeeze the OSDs to within 1 PG of
> eachother.
> >>
> >> I'm starting to think this should be the default.
> >>
> >> Cheers, dan
> >>
> >>
> >> On Wed, Jul 28, 2021 at 9:08 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx>
> wrote:
> >> >
> >> > Dear all,
> >> >
> >> > I'm running Ceph 14.2.11. I have 140 HDDs in my cluster of 4 nodes,
> 35 HDDs
> >> > per node. I am observing fill ratios of 66% to 70% of OSDs and then
> one
> >> > with 82% (see attached ceph-osd-df.txt for output of "ceph osd df").
> >> >
> >> > Previously, I had problems with single OSDs filling up to 85% and then
> >> > everything coming to a grinding halt. Ideally, I would like to have
> all OSD
> >> > fill grade to be close to the mean of 67%... At the very least I need
> to
> >> > get the 82% OSD back into the range.
> >> >
> >> > I have upmap balancing enabled and the balancer says:
> >> >
> >> > # ceph balancer status
> >> > {
> >> >     "last_optimize_duration": "0:00:00.053686",
> >> >     "plans": [],
> >> >     "mode": "upmap",
> >> >     "active": true,
> >> >     "optimize_result": "Unable to find further optimization, or
> pool(s)
> >> > pg_num is decreasing, or distribution is already perfect",
> >> >     "last_optimize_started": "Wed Jul 28 09:03:02 2021"
> >> > }
> >> >
> >> > Creating an offline balancing plan looks like this:
> >> >
> >> > # ceph osd getmap -o om
> >> > got osdmap epoch 281708
> >> > # osdmaptool om --upmap out.txt --upmap-pool hdd_ec --upmap-deviation
> 1
> >> > --upmap-active
> >> > osdmaptool: osdmap file 'om'
> >> > writing upmap command output to: out.txt
> >> > checking for upmap cleanups
> >> > upmap, max-count 10, max deviation 1
> >> >  limiting to pools hdd_ec ([3])
> >> > pools hdd_ec
> >> > prepared 0/10 changes
> >> > Time elapsed 0.0275739 secs
> >> > Unable to find further optimization, or distribution is already
> perfect
> >> > osd.0 pgs 43
> >> > [...]
> >> > # wc -l out.txt
> >> > 0 out.txt
> >> >
> >> > Does anyone have a suggestion how to proceed getting the 82% OSD
> closer to
> >> > the mean fill ratio (and maybe the other OSDs as well?)
> >> >
> >> > Thanks,
> >> > Manuel
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx