Re: hdd pg's migrating when converting ssd class osd's

Frank Schilder <frans@xxxxxx> · Wed, 30 Sep 2020 06:59:43 +0000

This is how my crush tree including shadow hierarchies looks like (a mess :): https://pastebin.com/iCLbi4Up

Every device class has its own tree. Starting with mimic, this is automatic when creating new device classes.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 30 September 2020 08:43:47
To: Frank Schilder
Cc: Marc Roos; ceph-users
Subject: Re:  Re: hdd pg's migrating when converting ssd class osd's

Interesting, I also did this test on an upgraded cluster (L to N).
I'll repeat the test on a native Nautilus to see it for myself.

Zitat von Frank Schilder <frans@xxxxxx>:

> Somebody on this list posted a script that can convert pre-mimic
> crush trees with buckets for different types of devices to crush
> trees with device classes with minimal data movement (trying to
> maintain IDs as much as possible). Don't have a thread name right
> now, but could try to find it tomorrow.
>
> I can check tomorrow how our crush tree unfolds. Basically, for
> every device class there is a full copy (shadow hierarchy) for each
> device class with its own weights etc.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
> Sent: 29 September 2020 22:19:33
> To: eblock; Frank Schilder
> Cc: ceph-users
> Subject: RE:  Re: hdd pg's migrating when converting ssd
> class osd's
>
> Yes correct this is coming from Luminous or maybe even Kraken. How does
> a default crush tree look like in mimic or octopus? Or is there some
> manual how to bring this to the new 'default'?
>
>
> -----Original Message-----
> Cc: ceph-users
> Subject: Re:  Re: hdd pg's migrating when converting ssd
> class osd's
>
> Are these crush maps inherited from pre-mimic versions? I have
> re-balanced SSD and HDD pools in mimic (mimic deployed) where one device
> class never influenced the placement of the other. I have mixed hosts
> and went as far as introducing rbd_meta, rbd_data and such classes to
> sub-divide even further (all these devices have different perf specs).
> This worked like a charm. When adding devices of one class, only pools
> in this class were ever affected.
>
> As far as I understand, starting with mimic, every shadow class defines
> a separate tree (not just leafs/OSDs). Thus, device classes are
> independent of each other.
>
>
>
> ________________________________________
> Sent: 29 September 2020 20:54:48
> To: eblock
> Cc: ceph-users
> Subject:  Re: hdd pg's migrating when converting ssd class
> osd's
>
> Yes correct, hosts have indeed both ssd's and hdd's combined. Is this
> not more of a bug then? I would assume the goal of using device classes
> is that you separate these and one does not affect the other, even the
> host weight of the ssd and hdd class are already available. The
> algorithm should just use that instead of the weight of the whole host.
> Or is there some specific use case, where these classes combined is
> required?
>
>
> -----Original Message-----
> Cc: ceph-users
> Subject: *****SPAM***** Re:  Re: hdd pg's migrating when
> converting ssd class osd's
>
> They're still in the same root (default) and each host is member of both
> device-classes, I guess you have a mixed setup (hosts c01/c02 have both
> HDDs and SSDs)? I don't think this separation is enough to avoid
> remapping even if a different device-class is affected (your report
> confirms that).
>
> Dividing the crush tree into different subtrees might help here but I'm
> not sure if that's really something you need. You might also just deal
> with the remapping as long as it doesn't happen too often, I guess. On
> the other hand, if your setup won't change (except adding more OSDs) you
> might as well think about a different crush tree. It really depends on
> your actual requirements.
>
> We created two different subtrees when we got new hardware and it helped
> us a lot moving the data only once to the new hardware avoiding multiple
> remappings, now the older hardware is our EC environment except for some
> SSDs on those old hosts that had to stay in the main subtree. So our
> setup is also very individual but it works quite nice.
> :-)
>
>
> Zitat von :
>
>> I have practically a default setup. If I do a 'ceph osd crush tree
>> --show-shadow' I have a listing like this[1]. I would assume from the
>> hosts being listed within the default~ssd and default~hdd, they are
>> separate (enough)?
>>
>>
>> [1]
>> root default~ssd
>>      host c01~ssd
>> ..
>> ..
>>      host c02~ssd
>> ..
>> root default~hdd
>>      host c01~hdd
>> ..
>>      host c02~hdd
>> ..
>> root default
>>
>>
>>
>>
>> -----Original Message-----
>> To: ceph-users@xxxxxxx
>> Subject:  Re: hdd pg's migrating when converting ssd class
>
>> osd's
>>
>> Are all the OSDs in the same crush root? I would think that since the
>> crush weight of hosts change as soon as OSDs are out it impacts the
>> whole crush tree. If you separate the SSDs from the HDDs logically
> (e.g.
>> different bucket type in the crush tree) the ramapping wouldn't affect
>
>> the HDDs.
>>
>>
>>
>>
>>> I have been converting ssd's osd's to dmcrypt, and I have noticed
>>> that
>>
>>> pg's of pools are migrated that should be (and are?) on hdd class.
>>>
>>> On a healthy ok cluster I am getting, when I set the crush reweight
>>> to
>>
>>> 0.0 of a ssd osd this:
>>>
>>> 17.35     10415                  0        0      9907       0
>>> 36001743890           0          0 3045     3045
>>> active+remapped+backfilling 2020-09-27 12:55:49.093054
>>> active+remapped+83758'20725398
>>> 83758:100379720  [8,14,23]          8  [3,14,23]              3
>>> 83636'20718129 2020-09-27 00:58:07.098096  83300'20689151 2020-09-24
>>> 21:42:07.385360             0
>>>
>>> However osds 3,14,23,8 are all hdd osd's
>>>
>>> Since this is a cluster from Kraken/Luminous, I am not sure if the
>>> device class of the replicated_ruleset[1] was set when the pool 17
>>> was
>>
>>> created.
>>> Weird thing is that all pg's of this pool seem to be on hdd osd[2]
>>>
>>> Q. How can I display the definition of 'crush_rule 0' at the time of
>>> the pool creation? (To be sure it had already this device class hdd
>>> configured)
>>>
>>>
>>>
>>> [1]
>>> [@~]# ceph osd pool ls detail | grep 'pool 17'
>>> pool 17 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash
>>> rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 83712
>>> flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
>>>
>>>
>>> [@~]# ceph osd crush rule dump replicated_ruleset {
>>>     "rule_id": 0,
>>>     "rule_name": "replicated_ruleset",
>>>     "ruleset": 0,
>>>     "type": 1,
>>>     "min_size": 1,
>>>     "max_size": 10,
>>>     "steps": [
>>>         {
>>>             "op": "take",
>>>             "item": -10,
>>>             "item_name": "default~hdd"
>>>         },
>>>         {
>>>             "op": "chooseleaf_firstn",
>>>             "num": 0,
>>>             "type": "host"
>>>         },
>>>         {
>>>             "op": "emit"
>>>         }
>>>     ]
>>> }
>>>
>>> [2]
>>> [@~]# for osd in `ceph pg dump pgs| grep '^17' | awk '{print $17"
>> "$19}'
>>> | grep -oE '[0-9]{1,2}'| sort -u -n`; do ceph osd crush
>>> | get-device-class
>>> osd.$osd ; done | sort -u
>>> dumped pgs
>>> hdd
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx