Re: hdd pg's migrating when converting ssd class osd's

Frank Schilder <frans@xxxxxx> · Wed, 30 Sep 2020 07:59:48 +0000

> To me it looks like the structure of both maps is pretty much the same -
> or am I mistaken?

Yes, but you are not Marc Roos. Do you work on the same cluster or do you observe the same problem?

In any case, here is a thread pointing to the crush tree/rule conversion I mentioned: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/675QZ2JXXX4RPRNPK2NL7FB5MVANKUB2/#675QZ2JXXX4RPRNPK2NL7FB5MVANKUB2

The tool is "crushtool reclassify" and is recommended to use when upgrading from luminous to newer to convert crush rules to use device classes.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Nico Schottelius <nico.schottelius@xxxxxxxxxxx>
Sent: 30 September 2020 09:12:49
To: Frank Schilder
Cc: Eugen Block; Marc Roos; ceph-users@xxxxxxx
Subject: Re:  Re: hdd pg's migrating when converting ssd class osd's

Hey Frank,

I uploaded our kraken created and nautilus upgraded crush map on [0].

To me it looks like the structure of both maps is pretty much the same -
or am I mistaken?

Best regards,

Nico

[0] https://www.nico.schottelius.org/temp/ceph-shadowtree20200930

Frank Schilder <frans@xxxxxx> writes:

> This is how my crush tree including shadow hierarchies looks like (a mess :): https://pastebin.com/iCLbi4Up
>
> Every device class has its own tree. Starting with mimic, this is automatic when creating new device classes.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Eugen Block <eblock@xxxxxx>
> Sent: 30 September 2020 08:43:47
> To: Frank Schilder
> Cc: Marc Roos; ceph-users
> Subject: Re:  Re: hdd pg's migrating when converting ssd class osd's
>
> Interesting, I also did this test on an upgraded cluster (L to N).
> I'll repeat the test on a native Nautilus to see it for myself.
>
>
> Zitat von Frank Schilder <frans@xxxxxx>:
>
>> Somebody on this list posted a script that can convert pre-mimic
>> crush trees with buckets for different types of devices to crush
>> trees with device classes with minimal data movement (trying to
>> maintain IDs as much as possible). Don't have a thread name right
>> now, but could try to find it tomorrow.
>>
>> I can check tomorrow how our crush tree unfolds. Basically, for
>> every device class there is a full copy (shadow hierarchy) for each
>> device class with its own weights etc.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
>> Sent: 29 September 2020 22:19:33
>> To: eblock; Frank Schilder
>> Cc: ceph-users
>> Subject: RE:  Re: hdd pg's migrating when converting ssd
>> class osd's
>>
>> Yes correct this is coming from Luminous or maybe even Kraken. How does
>> a default crush tree look like in mimic or octopus? Or is there some
>> manual how to bring this to the new 'default'?
>>
>>
>> -----Original Message-----
>> Cc: ceph-users
>> Subject: Re:  Re: hdd pg's migrating when converting ssd
>> class osd's
>>
>> Are these crush maps inherited from pre-mimic versions? I have
>> re-balanced SSD and HDD pools in mimic (mimic deployed) where one device
>> class never influenced the placement of the other. I have mixed hosts
>> and went as far as introducing rbd_meta, rbd_data and such classes to
>> sub-divide even further (all these devices have different perf specs).
>> This worked like a charm. When adding devices of one class, only pools
>> in this class were ever affected.
>>
>> As far as I understand, starting with mimic, every shadow class defines
>> a separate tree (not just leafs/OSDs). Thus, device classes are
>> independent of each other.
>>
>>
>>
>> ________________________________________
>> Sent: 29 September 2020 20:54:48
>> To: eblock
>> Cc: ceph-users
>> Subject:  Re: hdd pg's migrating when converting ssd class
>> osd's
>>
>> Yes correct, hosts have indeed both ssd's and hdd's combined. Is this
>> not more of a bug then? I would assume the goal of using device classes
>> is that you separate these and one does not affect the other, even the
>> host weight of the ssd and hdd class are already available. The
>> algorithm should just use that instead of the weight of the whole host.
>> Or is there some specific use case, where these classes combined is
>> required?
>>
>>
>> -----Original Message-----
>> Cc: ceph-users
>> Subject: *****SPAM***** Re:  Re: hdd pg's migrating when
>> converting ssd class osd's
>>
>> They're still in the same root (default) and each host is member of both
>> device-classes, I guess you have a mixed setup (hosts c01/c02 have both
>> HDDs and SSDs)? I don't think this separation is enough to avoid
>> remapping even if a different device-class is affected (your report
>> confirms that).
>>
>> Dividing the crush tree into different subtrees might help here but I'm
>> not sure if that's really something you need. You might also just deal
>> with the remapping as long as it doesn't happen too often, I guess. On
>> the other hand, if your setup won't change (except adding more OSDs) you
>> might as well think about a different crush tree. It really depends on
>> your actual requirements.
>>
>> We created two different subtrees when we got new hardware and it helped
>> us a lot moving the data only once to the new hardware avoiding multiple
>> remappings, now the older hardware is our EC environment except for some
>> SSDs on those old hosts that had to stay in the main subtree. So our
>> setup is also very individual but it works quite nice.
>> :-)
>>
>>
>> Zitat von :
>>
>>> I have practically a default setup. If I do a 'ceph osd crush tree
>>> --show-shadow' I have a listing like this[1]. I would assume from the
>>> hosts being listed within the default~ssd and default~hdd, they are
>>> separate (enough)?
>>>
>>>
>>> [1]
>>> root default~ssd
>>>      host c01~ssd
>>> ..
>>> ..
>>>      host c02~ssd
>>> ..
>>> root default~hdd
>>>      host c01~hdd
>>> ..
>>>      host c02~hdd
>>> ..
>>> root default
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> To: ceph-users@xxxxxxx
>>> Subject:  Re: hdd pg's migrating when converting ssd class
>>
>>> osd's
>>>
>>> Are all the OSDs in the same crush root? I would think that since the
>>> crush weight of hosts change as soon as OSDs are out it impacts the
>>> whole crush tree. If you separate the SSDs from the HDDs logically
>> (e.g.
>>> different bucket type in the crush tree) the ramapping wouldn't affect
>>
>>> the HDDs.
>>>
>>>
>>>
>>>
>>>> I have been converting ssd's osd's to dmcrypt, and I have noticed
>>>> that
>>>
>>>> pg's of pools are migrated that should be (and are?) on hdd class.
>>>>
>>>> On a healthy ok cluster I am getting, when I set the crush reweight
>>>> to
>>>
>>>> 0.0 of a ssd osd this:
>>>>
>>>> 17.35     10415                  0        0      9907       0
>>>> 36001743890           0          0 3045     3045
>>>> active+remapped+backfilling 2020-09-27 12:55:49.093054
>>>> active+remapped+83758'20725398
>>>> 83758:100379720  [8,14,23]          8  [3,14,23]              3
>>>> 83636'20718129 2020-09-27 00:58:07.098096  83300'20689151 2020-09-24
>>>> 21:42:07.385360             0
>>>>
>>>> However osds 3,14,23,8 are all hdd osd's
>>>>
>>>> Since this is a cluster from Kraken/Luminous, I am not sure if the
>>>> device class of the replicated_ruleset[1] was set when the pool 17
>>>> was
>>>
>>>> created.
>>>> Weird thing is that all pg's of this pool seem to be on hdd osd[2]
>>>>
>>>> Q. How can I display the definition of 'crush_rule 0' at the time of
>>>> the pool creation? (To be sure it had already this device class hdd
>>>> configured)
>>>>
>>>>
>>>>
>>>> [1]
>>>> [@~]# ceph osd pool ls detail | grep 'pool 17'
>>>> pool 17 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash
>>>> rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 83712
>>>> flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
>>>>
>>>>
>>>> [@~]# ceph osd crush rule dump replicated_ruleset {
>>>>     "rule_id": 0,
>>>>     "rule_name": "replicated_ruleset",
>>>>     "ruleset": 0,
>>>>     "type": 1,
>>>>     "min_size": 1,
>>>>     "max_size": 10,
>>>>     "steps": [
>>>>         {
>>>>             "op": "take",
>>>>             "item": -10,
>>>>             "item_name": "default~hdd"
>>>>         },
>>>>         {
>>>>             "op": "chooseleaf_firstn",
>>>>             "num": 0,
>>>>             "type": "host"
>>>>         },
>>>>         {
>>>>             "op": "emit"
>>>>         }
>>>>     ]
>>>> }
>>>>
>>>> [2]
>>>> [@~]# for osd in `ceph pg dump pgs| grep '^17' | awk '{print $17"
>>> "$19}'
>>>> | grep -oE '[0-9]{1,2}'| sort -u -n`; do ceph osd crush
>>>> | get-device-class
>>>> osd.$osd ; done | sort -u
>>>> dumped pgs
>>>> hdd
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
>> email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx