Re: Issue 15653 and imbalanced clusters

Dan Van Der Ster <daniel.vanderster@xxxxxxx> · Tue, 22 Nov 2016 12:51:49 +0000

Hi,

Indeed, we reweight OSDs to balance them, just not very effectively on this particular cluster. But I'm curious if reweighting alone can fix this: if all of a host's OSDs are reweighted by, say 0.5, does that result in other hosts being selected? Or do we need to change the crush weights themselves to fix this kind of imbalance?

-- Dan

> On 22 Nov 2016, at 13:44, Bartłomiej Święcki <bartlomiej.swiecki@xxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> We've observed very similar problems in our clusters, it requires a lot of careful reweight to keep OSDs more or less at the same usage level.
> Because of that issue, we're currently trying to keep Racks as regular as possible. Hope the patch you mentioned will address this too.
> 
> Regards,
> Bartek
> 
> 
> On 11/22/2016 01:33 PM, Dan Van Der Ster wrote:
>> Hi,
>> 
>> I have a couple questions about http://tracker.ceph.com/issues/15653
>> 
>> In the ticket Sage discusses small/big drives, and the small drives get more data than expected.
>> 
>> But we observe this at the rack level: our cluster has four racks, with 7, 8, 8, 4 hosts respectively. The rack with 4 hosts is ~35% more full than the others.
>> 
>> So AFAICT, because of #15653, CRUSH does not currently work well if you try to build a pool which is replicated rack/host-wise when your rack/hosts are not all ~identical in size.
>> 
>> Are others noticing this pattern?
>> Or are we unusual in that our clusters are not flat/uniform in structure?
>> 
>> Cheers, Dan
>> _______________________________________________
>> Ceph-large mailing list
>> Ceph-large@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com
> 

_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com