Re: cephfs full, 2/3 Raw capacity used

Wido den Hollander <wido@xxxxxxxx> · Mon, 26 Aug 2019 14:39:35 +0200

On 8/26/19 1:35 PM, Simon Oosthoek wrote:
> On 26-08-19 13:25, Simon Oosthoek wrote:
>> On 26-08-19 13:11, Wido den Hollander wrote:
>> <snip>
>>>
>>> The reweight might actually cause even more confusion for the balancer.
>>> The balancer uses upmap mode and that re-allocates PGs to different OSDs
>>> if needed.
>>>
>>> Looking at the output send earlier I have some replies. See below.
>>>
>> <snip>
>>>
>>> Looking at this output the balancing seems OK, but from a different
>>> perspective.
>>>
>>> PGs are allocated to OSDs and not Objects nor data. All OSDs have 95~97
>>> Placement Groups allocated.
>>>
>>> That's good! A almost perfect distribution.
>>>
>>> The problem that now rises is the difference in the size of these
>>> Placement Groups as they hold different objects.
>>>
>>> This is one of the side-effects of larger disks. The PGs on them will
>>> grow and this will lead to inbalance between the OSDs.
>>>
>>> I *think* that increasing the amount of PGs on this cluster would help,
>>> but only for the pools which will contain most of the data.
>>>
>>> This will consume a bit more CPU Power and Memory, but on modern systems
>>> this should be less of a problem.
>>>
>>> The good thing is that with Nautilus you can also scale down on the
>>> amount of PGs if things would become a problem.
>>>
>>> More PGs will mean smaller PGs and thus lead to a better data
>>> distribution.
>> <snip>
>>
>> That makes sense, dividing the data in smaller chunks makes it more
>> flexible. The osd nodes are quite underloaded, even with turbo
>> recovery mode on (10, not 32 ;-).
>>
>> When the cluster is in HEALTH_OK again, I'll increase the PGs for the
>> cephfs pools...
> 
> On second thought, I reverted my reweight commands and adjusted the PGs,
> which were quite low for some of the pools. The reason they were low is
> that when we first created them, we expected them to be rarely used, but
> then we started filling them just for the filling, and these are
> probably the cause of the unbalance.
> 

You should make sure that the pools which contain the most data have the
most PGs.

Although ~100 PGs per OSD is the recommendation it won't hurt to have
~200 PGs as long as you have enough CPU power and Memory. More PGs will
mean better data distribution with such large disks.

> The cluster now has over 8% misplaced objects, so that can take a while...
> 
> Cheers
> 
> /Simon
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com