Re: Handling out-of-balance OSD?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



How wide is hdd_ec? With a wide EC rule and relatively few OSDs and
relatively few PGs per OSD for the pool, it can be impossible for the
balancer to make things perfect.
It would help to look at the PG distribution for only the hdd_ec pool
-- this script can help
https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-pool-pg-distribution

Another possibility is that osd.0 has some leftover data from PGs that
should have been deleted. From the box, check: `ceph daemon osd.0
status` and compare the number of PGs it holds vs the value in your
osd df output (48).

-- dan

On Wed, Jul 28, 2021 at 9:24 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote:
>
> Hi,
>
> thanks for your quick response. I already did this earlier this week:
>
> # ceph config dump | grep upmap_max_deviation
>   mgr       advanced mgr/balancer/upmap_max_deviation                    1
>
> Cheers,
> Manuel
>
> On Wed, Jul 28, 2021 at 9:15 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>>
>> Hi,
>>
>> Start by setting:
>>
>>     ceph config set mgr mgr/balancer/upmap_max_deviation 1
>>
>> This configures the balancer to squeeze the OSDs to within 1 PG of eachother.
>>
>> I'm starting to think this should be the default.
>>
>> Cheers, dan
>>
>>
>> On Wed, Jul 28, 2021 at 9:08 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote:
>> >
>> > Dear all,
>> >
>> > I'm running Ceph 14.2.11. I have 140 HDDs in my cluster of 4 nodes, 35 HDDs
>> > per node. I am observing fill ratios of 66% to 70% of OSDs and then one
>> > with 82% (see attached ceph-osd-df.txt for output of "ceph osd df").
>> >
>> > Previously, I had problems with single OSDs filling up to 85% and then
>> > everything coming to a grinding halt. Ideally, I would like to have all OSD
>> > fill grade to be close to the mean of 67%... At the very least I need to
>> > get the 82% OSD back into the range.
>> >
>> > I have upmap balancing enabled and the balancer says:
>> >
>> > # ceph balancer status
>> > {
>> >     "last_optimize_duration": "0:00:00.053686",
>> >     "plans": [],
>> >     "mode": "upmap",
>> >     "active": true,
>> >     "optimize_result": "Unable to find further optimization, or pool(s)
>> > pg_num is decreasing, or distribution is already perfect",
>> >     "last_optimize_started": "Wed Jul 28 09:03:02 2021"
>> > }
>> >
>> > Creating an offline balancing plan looks like this:
>> >
>> > # ceph osd getmap -o om
>> > got osdmap epoch 281708
>> > # osdmaptool om --upmap out.txt --upmap-pool hdd_ec --upmap-deviation 1
>> > --upmap-active
>> > osdmaptool: osdmap file 'om'
>> > writing upmap command output to: out.txt
>> > checking for upmap cleanups
>> > upmap, max-count 10, max deviation 1
>> >  limiting to pools hdd_ec ([3])
>> > pools hdd_ec
>> > prepared 0/10 changes
>> > Time elapsed 0.0275739 secs
>> > Unable to find further optimization, or distribution is already perfect
>> > osd.0 pgs 43
>> > [...]
>> > # wc -l out.txt
>> > 0 out.txt
>> >
>> > Does anyone have a suggestion how to proceed getting the 82% OSD closer to
>> > the mean fill ratio (and maybe the other OSDs as well?)
>> >
>> > Thanks,
>> > Manuel
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux