Re: [External Email] Overlapping Roots - How to Fix?

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Wed, 18 Sep 2024 23:10:10 -0400

> 
> Anthony,
> 
> So it sounds like I need to make a new crush rule for replicated pools that specifies default-hdd and the device class?  (Or should I go the other way around?  I think I'd rather change the replicated pools even though there's more of them.)

I think it would be best to edit the CRUSH rules in-situ so that each specifies the device class, that way if you do get different media in the future, you'll be ready.  Rather than messing around with new rules and modifying pools, this is arguably one of the few times when one would decompile, edit, recompile, and inject the CRUSH map in toto.  

I haven't tried this myself, but maybe something like the below, to avoid the PITA and potential for error of edting the decompiled text file by hand.

ceph osd getcrushmap -o original.crush 
crushtool -d original.crush -o original.txt 
crushtool -i original.crush --reclassify --reclassify-root default hdd --set-subtree-class default hdd -o adjusted.crush 
crushtool -d adjusted.crush -o adjusted.txt 
crushtool -i original.crush --compare adjusted.crush 
ceph osd setcrushmap -i adjusted.crush 

> 
> Then, after I create this new rule, I simply assign the pool to a new crush rule using a command similar to the one shown in your note in the link you referenced?
> 
> Thanks.
> 
> -Dave
> 
> --
> Dave Hall
> Binghamton University
> kdhall@xxxxxxxxxxxxxx <mailto:kdhall@xxxxxxxxxxxxxx>
> 
> On Wed, Sep 18, 2024 at 2:10 PM Anthony D'Atri <anthony.datri@xxxxxxxxx <mailto:anthony.datri@xxxxxxxxx>> wrote:
>> 
>> 
>>> 
>>> Helllo,
>>> 
>>> I've reviewed some recent posts in this list and also searched Google for
>>> info about autoscale and overlapping roots.  In what I have found I do not
>>> see anything that I can understand regarding how to fix the issue -
>>> probably because I don't deal with Crush on a regular basis.
>> 
>> 
>> Checkout the Note in this section:  https://docs.ceph.com/en/reef/rados/operations/placement-groups/#viewing-pg-scaling-recommendations
>> 
>> I added that last year I think it was as a result of how Rook was creating pools.
>> 
>>> 
>>> From what I read and looking at 'ceph osd crush rule dump', it looks like
>>> the 8 replicated pools have
>>> 
>>>                    "op": "take",
>>>                    "item": -1,
>>>                    "item_name": "default"
>>> 
>>> whereas the 2 EC pools have
>>> 
>>>                    "op": "take",
>>>                    "item": -2,
>>>                    "item_name": "default~hdd"
>>> 
>>> To be sure, all of my OSDs are identical - HDD with SSD WAL/DB.
>>> 
>>> Please advise on how to fix this.
>> 
>> The subtlety that's easy to miss is that when you specify a device class for only *some* pools, the pools/rules that specify a device class effectively act on a "shadow" CRUSH root.  My terminology may be inexact there.
>> 
>> So I think if you adjust your CRUSH rules so that they all specify a device class -- in your case all the same device class -- your problem (and balancer performance perhaps) will improve.
>> 
>> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx