Re: Handling out-of-balance OSD?

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 28 Jul 2021 10:37:59 +0200

Cool, looks like the second problem is the real issue here :)

IIRC, you can remove the leftover PGs with ceph-objectstore-tool. I
don't recall the exact syntax, but you'd need to find out which PGs
are not mapped there by the current crush rule and remove the others.
Or, you can zap and re-create the OSD.

-- dan

On Wed, Jul 28, 2021 at 10:34 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote:
>
> How "wide" is "wide". I have 4 nodes and 140 HDD OSDs. Here is the info as from the Ceph system:
>
> # ceph osd erasure-code-profile get hdd_ec
> crush-device-class=hdd
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=2
> m=1
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Here is what your script gives:
>
> # python tools/ceph-pool-pg-distribution 3
> Searching for PGs in pools: ['3']
> Summary: 2048 PGs on 140 osds
>
> Num OSDs with X PGs:
> 43: 16
> 44: 124
>
> ... and finally your last proposal, so it looks like I have some left-over pgs, see below. I'm also observing PG values than 43/44 on other osds in the system.
>
> # ceph daemon osd.0 status
> {
>     "cluster_fsid": "55633ec3-6c0c-4a02-990c-0f87e0f7a01f",
>     "osd_fsid": "85e266f1-8d8c-4c2a-b03c-0aef9bc4e532",
>     "whoami": 0,
>     "state": "active",
>     "oldest_map": 99775,
>     "newest_map": 281713,
>     "num_pgs": 77
> }
>
> I found this ticket (https://tracker.ceph.com/issues/38931 I believe you actually opened it ;-)) and tried to restart the osd.0 and now the OSD is scrubbing some of its pgs... However, I'm uncertain that this is actually trimming the left-over pgs.
>
> Thanks for all your help up to this point already!
>
> Best wishes,
> Manuel
>
> On Wed, Jul 28, 2021 at 9:55 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>>
>> How wide is hdd_ec? With a wide EC rule and relatively few OSDs and
>> relatively few PGs per OSD for the pool, it can be impossible for the
>> balancer to make things perfect.
>> It would help to look at the PG distribution for only the hdd_ec pool
>> -- this script can help
>> https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-pool-pg-distribution
>>
>> Another possibility is that osd.0 has some leftover data from PGs that
>> should have been deleted. From the box, check: `ceph daemon osd.0
>> status` and compare the number of PGs it holds vs the value in your
>> osd df output (48).
>>
>> -- dan
>>
>> On Wed, Jul 28, 2021 at 9:24 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote:
>> >
>> > Hi,
>> >
>> > thanks for your quick response. I already did this earlier this week:
>> >
>> > # ceph config dump | grep upmap_max_deviation
>> >   mgr       advanced mgr/balancer/upmap_max_deviation                    1
>> >
>> > Cheers,
>> > Manuel
>> >
>> > On Wed, Jul 28, 2021 at 9:15 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Start by setting:
>> >>
>> >>     ceph config set mgr mgr/balancer/upmap_max_deviation 1
>> >>
>> >> This configures the balancer to squeeze the OSDs to within 1 PG of eachother.
>> >>
>> >> I'm starting to think this should be the default.
>> >>
>> >> Cheers, dan
>> >>
>> >>
>> >> On Wed, Jul 28, 2021 at 9:08 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote:
>> >> >
>> >> > Dear all,
>> >> >
>> >> > I'm running Ceph 14.2.11. I have 140 HDDs in my cluster of 4 nodes, 35 HDDs
>> >> > per node. I am observing fill ratios of 66% to 70% of OSDs and then one
>> >> > with 82% (see attached ceph-osd-df.txt for output of "ceph osd df").
>> >> >
>> >> > Previously, I had problems with single OSDs filling up to 85% and then
>> >> > everything coming to a grinding halt. Ideally, I would like to have all OSD
>> >> > fill grade to be close to the mean of 67%... At the very least I need to
>> >> > get the 82% OSD back into the range.
>> >> >
>> >> > I have upmap balancing enabled and the balancer says:
>> >> >
>> >> > # ceph balancer status
>> >> > {
>> >> >     "last_optimize_duration": "0:00:00.053686",
>> >> >     "plans": [],
>> >> >     "mode": "upmap",
>> >> >     "active": true,
>> >> >     "optimize_result": "Unable to find further optimization, or pool(s)
>> >> > pg_num is decreasing, or distribution is already perfect",
>> >> >     "last_optimize_started": "Wed Jul 28 09:03:02 2021"
>> >> > }
>> >> >
>> >> > Creating an offline balancing plan looks like this:
>> >> >
>> >> > # ceph osd getmap -o om
>> >> > got osdmap epoch 281708
>> >> > # osdmaptool om --upmap out.txt --upmap-pool hdd_ec --upmap-deviation 1
>> >> > --upmap-active
>> >> > osdmaptool: osdmap file 'om'
>> >> > writing upmap command output to: out.txt
>> >> > checking for upmap cleanups
>> >> > upmap, max-count 10, max deviation 1
>> >> >  limiting to pools hdd_ec ([3])
>> >> > pools hdd_ec
>> >> > prepared 0/10 changes
>> >> > Time elapsed 0.0275739 secs
>> >> > Unable to find further optimization, or distribution is already perfect
>> >> > osd.0 pgs 43
>> >> > [...]
>> >> > # wc -l out.txt
>> >> > 0 out.txt
>> >> >
>> >> > Does anyone have a suggestion how to proceed getting the 82% OSD closer to
>> >> > the mean fill ratio (and maybe the other OSDs as well?)
>> >> >
>> >> > Thanks,
>> >> > Manuel
>> >> > _______________________________________________
>> >> > ceph-users mailing list -- ceph-users@xxxxxxx
>> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx