Re: hdd pg's migrating when converting ssd class osd's

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Wed, 30 Sep 2020 14:59:50 +0200

Hi Frank, thanks this 'root default' indeed looks different with these 0 
there. I have also uploaded mine[1] because it looks very similar to 
Nico's. I guess his hdd pg's can also start moving in some occassions. 
Thanks for 'crushtool reclassify' hint, I guess I have missed this in 
the release notes or so.

[1]
https://pastebin.com/PFx0V3S7

-----Original Message-----
To: Eugen Block
Cc: Marc Roos; ceph-users
Subject: Re:  Re: hdd pg's migrating when converting ssd 
class osd's

This is how my crush tree including shadow hierarchies looks like (a 
mess :): https://pastebin.com/iCLbi4Up

Every device class has its own tree. Starting with mimic, this is 
automatic when creating new device classes.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 30 September 2020 08:43:47
To: Frank Schilder
Cc: Marc Roos; ceph-users
Subject: Re:  Re: hdd pg's migrating when converting ssd 
class osd's

Interesting, I also did this test on an upgraded cluster (L to N).
I'll repeat the test on a native Nautilus to see it for myself.

Zitat von Frank Schilder

> Somebody on this list posted a script that can convert pre-mimic crush 

> trees with buckets for different types of devices to crush trees with 
> device classes with minimal data movement (trying to maintain IDs as 
> much as possible). Don't have a thread name right now, but could try 
> to find it tomorrow.
>
> I can check tomorrow how our crush tree unfolds. Basically, for every 
> device class there is a full copy (shadow hierarchy) for each device 
> class with its own weights etc.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Marc Roos
> Sent: 29 September 2020 22:19:33
> To: eblock; Frank Schilder
> Cc: ceph-users
> Subject: RE:  Re: hdd pg's migrating when converting ssd 
> class osd's
>
> Yes correct this is coming from Luminous or maybe even Kraken. How 
> does a default crush tree look like in mimic or octopus? Or is there 
> some manual how to bring this to the new 'default'?
>
>
> -----Original Message-----
> Cc: ceph-users
> Subject: Re:  Re: hdd pg's migrating when converting ssd 
> class osd's
>
> Are these crush maps inherited from pre-mimic versions? I have 
> re-balanced SSD and HDD pools in mimic (mimic deployed) where one 
> device class never influenced the placement of the other. I have mixed 

> hosts and went as far as introducing rbd_meta, rbd_data and such 
> classes to sub-divide even further (all these devices have different 
perf specs).
> This worked like a charm. When adding devices of one class, only pools 

> in this class were ever affected.
>
> As far as I understand, starting with mimic, every shadow class 
> defines a separate tree (not just leafs/OSDs). Thus, device classes 
> are independent of each other.
>
>
>
> ________________________________________
> Sent: 29 September 2020 20:54:48
> To: eblock
> Cc: ceph-users
> Subject:  Re: hdd pg's migrating when converting ssd class 

> osd's
>
> Yes correct, hosts have indeed both ssd's and hdd's combined. Is this 
> not more of a bug then? I would assume the goal of using device 
> classes is that you separate these and one does not affect the other, 
> even the host weight of the ssd and hdd class are already available. 
> The algorithm should just use that instead of the weight of the whole 
host.
> Or is there some specific use case, where these classes combined is 
> required?
>
>
> -----Original Message-----
> Cc: ceph-users
> Subject: *****SPAM***** Re:  Re: hdd pg's migrating when 
> converting ssd class osd's
>
> They're still in the same root (default) and each host is member of 
> both device-classes, I guess you have a mixed setup (hosts c01/c02 
> have both HDDs and SSDs)? I don't think this separation is enough to 
> avoid remapping even if a different device-class is affected (your 
> report confirms that).
>
> Dividing the crush tree into different subtrees might help here but 
> I'm not sure if that's really something you need. You might also just 
> deal with the remapping as long as it doesn't happen too often, I 
> guess. On the other hand, if your setup won't change (except adding 
> more OSDs) you might as well think about a different crush tree. It 
> really depends on your actual requirements.
>
> We created two different subtrees when we got new hardware and it 
> helped us a lot moving the data only once to the new hardware avoiding 

> multiple remappings, now the older hardware is our EC environment 
> except for some SSDs on those old hosts that had to stay in the main 
> subtree. So our setup is also very individual but it works quite nice.
> :-)
>
>
> Zitat von :
>
>> I have practically a default setup. If I do a 'ceph osd crush tree 
>> --show-shadow' I have a listing like this[1]. I would assume from the 

>> hosts being listed within the default~ssd and default~hdd, they are 
>> separate (enough)?
>>
>>
>> [1]
>> root default~ssd
>>      host c01~ssd
>> ..
>> ..
>>      host c02~ssd
>> ..
>> root default~hdd
>>      host c01~hdd
>> ..
>>      host c02~hdd
>> ..
>> root default
>>
>>
>>
>>
>> -----Original Message-----
>> To: ceph-users@xxxxxxx
>> Subject:  Re: hdd pg's migrating when converting ssd 
>> class
>
>> osd's
>>
>> Are all the OSDs in the same crush root? I would think that since the 

>> crush weight of hosts change as soon as OSDs are out it impacts the 
>> whole crush tree. If you separate the SSDs from the HDDs logically
> (e.g.
>> different bucket type in the crush tree) the ramapping wouldn't 
>> affect
>
>> the HDDs.
>>
>>
>>
>>
>>> I have been converting ssd's osd's to dmcrypt, and I have noticed 
>>> that
>>
>>> pg's of pools are migrated that should be (and are?) on hdd class.
>>>
>>> On a healthy ok cluster I am getting, when I set the crush reweight 
>>> to
>>
>>> 0.0 of a ssd osd this:
>>>
>>> 17.35     10415                  0        0      9907       0
>>> 36001743890           0          0 3045     3045
>>> active+remapped+backfilling 2020-09-27 12:55:49.093054
>>> active+remapped+83758'20725398
>>> 83758:100379720  [8,14,23]          8  [3,14,23]              3
>>> 83636'20718129 2020-09-27 00:58:07.098096  83300'20689151 2020-09-24
>>> 21:42:07.385360             0
>>>
>>> However osds 3,14,23,8 are all hdd osd's
>>>
>>> Since this is a cluster from Kraken/Luminous, I am not sure if the 
>>> device class of the replicated_ruleset[1] was set when the pool 17 
>>> was
>>
>>> created.
>>> Weird thing is that all pg's of this pool seem to be on hdd osd[2]
>>>
>>> Q. How can I display the definition of 'crush_rule 0' at the time of 

>>> the pool creation? (To be sure it had already this device class hdd
>>> configured)
>>>
>>>
>>>
>>> [1]
>>> [@~]# ceph osd pool ls detail | grep 'pool 17'
>>> pool 17 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash 
>>> rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 83712 
>>> flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
>>>
>>>
>>> [@~]# ceph osd crush rule dump replicated_ruleset {
>>>     "rule_id": 0,
>>>     "rule_name": "replicated_ruleset",
>>>     "ruleset": 0,
>>>     "type": 1,
>>>     "min_size": 1,
>>>     "max_size": 10,
>>>     "steps": [
>>>         {
>>>             "op": "take",
>>>             "item": -10,
>>>             "item_name": "default~hdd"
>>>         },
>>>         {
>>>             "op": "chooseleaf_firstn",
>>>             "num": 0,
>>>             "type": "host"
>>>         },
>>>         {
>>>             "op": "emit"
>>>         }
>>>     ]
>>> }
>>>
>>> [2]
>>> [@~]# for osd in `ceph pg dump pgs| grep '^17' | awk '{print $17"
>> "$19}'
>>> | grep -oE '[0-9]{1,2}'| sort -u -n`; do ceph osd crush 
>>> | get-device-class
>>> osd.$osd ; done | sort -u
>>> dumped pgs
>>> hdd
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx