On 8/26/19 1:35 PM, Simon Oosthoek wrote: > On 26-08-19 13:25, Simon Oosthoek wrote: >> On 26-08-19 13:11, Wido den Hollander wrote: >> <snip> >>> >>> The reweight might actually cause even more confusion for the balancer. >>> The balancer uses upmap mode and that re-allocates PGs to different OSDs >>> if needed. >>> >>> Looking at the output send earlier I have some replies. See below. >>> >> <snip> >>> >>> Looking at this output the balancing seems OK, but from a different >>> perspective. >>> >>> PGs are allocated to OSDs and not Objects nor data. All OSDs have 95~97 >>> Placement Groups allocated. >>> >>> That's good! A almost perfect distribution. >>> >>> The problem that now rises is the difference in the size of these >>> Placement Groups as they hold different objects. >>> >>> This is one of the side-effects of larger disks. The PGs on them will >>> grow and this will lead to inbalance between the OSDs. >>> >>> I *think* that increasing the amount of PGs on this cluster would help, >>> but only for the pools which will contain most of the data. >>> >>> This will consume a bit more CPU Power and Memory, but on modern systems >>> this should be less of a problem. >>> >>> The good thing is that with Nautilus you can also scale down on the >>> amount of PGs if things would become a problem. >>> >>> More PGs will mean smaller PGs and thus lead to a better data >>> distribution. >> <snip> >> >> That makes sense, dividing the data in smaller chunks makes it more >> flexible. The osd nodes are quite underloaded, even with turbo >> recovery mode on (10, not 32 ;-). >> >> When the cluster is in HEALTH_OK again, I'll increase the PGs for the >> cephfs pools... > > On second thought, I reverted my reweight commands and adjusted the PGs, > which were quite low for some of the pools. The reason they were low is > that when we first created them, we expected them to be rarely used, but > then we started filling them just for the filling, and these are > probably the cause of the unbalance. > You should make sure that the pools which contain the most data have the most PGs. Although ~100 PGs per OSD is the recommendation it won't hurt to have ~200 PGs as long as you have enough CPU power and Memory. More PGs will mean better data distribution with such large disks. > The cluster now has over 8% misplaced objects, so that can take a while... > > Cheers > > /Simon > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com