> To me it looks like the structure of both maps is pretty much the same - > or am I mistaken? Yes, but you are not Marc Roos. Do you work on the same cluster or do you observe the same problem? In any case, here is a thread pointing to the crush tree/rule conversion I mentioned: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/675QZ2JXXX4RPRNPK2NL7FB5MVANKUB2/#675QZ2JXXX4RPRNPK2NL7FB5MVANKUB2 The tool is "crushtool reclassify" and is recommended to use when upgrading from luminous to newer to convert crush rules to use device classes. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Nico Schottelius <nico.schottelius@xxxxxxxxxxx> Sent: 30 September 2020 09:12:49 To: Frank Schilder Cc: Eugen Block; Marc Roos; ceph-users@xxxxxxx Subject: Re: Re: hdd pg's migrating when converting ssd class osd's Hey Frank, I uploaded our kraken created and nautilus upgraded crush map on [0]. To me it looks like the structure of both maps is pretty much the same - or am I mistaken? Best regards, Nico [0] https://www.nico.schottelius.org/temp/ceph-shadowtree20200930 Frank Schilder <frans@xxxxxx> writes: > This is how my crush tree including shadow hierarchies looks like (a mess :): https://pastebin.com/iCLbi4Up > > Every device class has its own tree. Starting with mimic, this is automatic when creating new device classes. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Eugen Block <eblock@xxxxxx> > Sent: 30 September 2020 08:43:47 > To: Frank Schilder > Cc: Marc Roos; ceph-users > Subject: Re: Re: hdd pg's migrating when converting ssd class osd's > > Interesting, I also did this test on an upgraded cluster (L to N). > I'll repeat the test on a native Nautilus to see it for myself. > > > Zitat von Frank Schilder <frans@xxxxxx>: > >> Somebody on this list posted a script that can convert pre-mimic >> crush trees with buckets for different types of devices to crush >> trees with device classes with minimal data movement (trying to >> maintain IDs as much as possible). Don't have a thread name right >> now, but could try to find it tomorrow. >> >> I can check tomorrow how our crush tree unfolds. Basically, for >> every device class there is a full copy (shadow hierarchy) for each >> device class with its own weights etc. >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> >> Sent: 29 September 2020 22:19:33 >> To: eblock; Frank Schilder >> Cc: ceph-users >> Subject: RE: Re: hdd pg's migrating when converting ssd >> class osd's >> >> Yes correct this is coming from Luminous or maybe even Kraken. How does >> a default crush tree look like in mimic or octopus? Or is there some >> manual how to bring this to the new 'default'? >> >> >> -----Original Message----- >> Cc: ceph-users >> Subject: Re: Re: hdd pg's migrating when converting ssd >> class osd's >> >> Are these crush maps inherited from pre-mimic versions? I have >> re-balanced SSD and HDD pools in mimic (mimic deployed) where one device >> class never influenced the placement of the other. I have mixed hosts >> and went as far as introducing rbd_meta, rbd_data and such classes to >> sub-divide even further (all these devices have different perf specs). >> This worked like a charm. When adding devices of one class, only pools >> in this class were ever affected. >> >> As far as I understand, starting with mimic, every shadow class defines >> a separate tree (not just leafs/OSDs). Thus, device classes are >> independent of each other. >> >> >> >> ________________________________________ >> Sent: 29 September 2020 20:54:48 >> To: eblock >> Cc: ceph-users >> Subject: Re: hdd pg's migrating when converting ssd class >> osd's >> >> Yes correct, hosts have indeed both ssd's and hdd's combined. Is this >> not more of a bug then? I would assume the goal of using device classes >> is that you separate these and one does not affect the other, even the >> host weight of the ssd and hdd class are already available. The >> algorithm should just use that instead of the weight of the whole host. >> Or is there some specific use case, where these classes combined is >> required? >> >> >> -----Original Message----- >> Cc: ceph-users >> Subject: *****SPAM***** Re: Re: hdd pg's migrating when >> converting ssd class osd's >> >> They're still in the same root (default) and each host is member of both >> device-classes, I guess you have a mixed setup (hosts c01/c02 have both >> HDDs and SSDs)? I don't think this separation is enough to avoid >> remapping even if a different device-class is affected (your report >> confirms that). >> >> Dividing the crush tree into different subtrees might help here but I'm >> not sure if that's really something you need. You might also just deal >> with the remapping as long as it doesn't happen too often, I guess. On >> the other hand, if your setup won't change (except adding more OSDs) you >> might as well think about a different crush tree. It really depends on >> your actual requirements. >> >> We created two different subtrees when we got new hardware and it helped >> us a lot moving the data only once to the new hardware avoiding multiple >> remappings, now the older hardware is our EC environment except for some >> SSDs on those old hosts that had to stay in the main subtree. So our >> setup is also very individual but it works quite nice. >> :-) >> >> >> Zitat von : >> >>> I have practically a default setup. If I do a 'ceph osd crush tree >>> --show-shadow' I have a listing like this[1]. I would assume from the >>> hosts being listed within the default~ssd and default~hdd, they are >>> separate (enough)? >>> >>> >>> [1] >>> root default~ssd >>> host c01~ssd >>> .. >>> .. >>> host c02~ssd >>> .. >>> root default~hdd >>> host c01~hdd >>> .. >>> host c02~hdd >>> .. >>> root default >>> >>> >>> >>> >>> -----Original Message----- >>> To: ceph-users@xxxxxxx >>> Subject: Re: hdd pg's migrating when converting ssd class >> >>> osd's >>> >>> Are all the OSDs in the same crush root? I would think that since the >>> crush weight of hosts change as soon as OSDs are out it impacts the >>> whole crush tree. If you separate the SSDs from the HDDs logically >> (e.g. >>> different bucket type in the crush tree) the ramapping wouldn't affect >> >>> the HDDs. >>> >>> >>> >>> >>>> I have been converting ssd's osd's to dmcrypt, and I have noticed >>>> that >>> >>>> pg's of pools are migrated that should be (and are?) on hdd class. >>>> >>>> On a healthy ok cluster I am getting, when I set the crush reweight >>>> to >>> >>>> 0.0 of a ssd osd this: >>>> >>>> 17.35 10415 0 0 9907 0 >>>> 36001743890 0 0 3045 3045 >>>> active+remapped+backfilling 2020-09-27 12:55:49.093054 >>>> active+remapped+83758'20725398 >>>> 83758:100379720 [8,14,23] 8 [3,14,23] 3 >>>> 83636'20718129 2020-09-27 00:58:07.098096 83300'20689151 2020-09-24 >>>> 21:42:07.385360 0 >>>> >>>> However osds 3,14,23,8 are all hdd osd's >>>> >>>> Since this is a cluster from Kraken/Luminous, I am not sure if the >>>> device class of the replicated_ruleset[1] was set when the pool 17 >>>> was >>> >>>> created. >>>> Weird thing is that all pg's of this pool seem to be on hdd osd[2] >>>> >>>> Q. How can I display the definition of 'crush_rule 0' at the time of >>>> the pool creation? (To be sure it had already this device class hdd >>>> configured) >>>> >>>> >>>> >>>> [1] >>>> [@~]# ceph osd pool ls detail | grep 'pool 17' >>>> pool 17 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash >>>> rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 83712 >>>> flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd >>>> >>>> >>>> [@~]# ceph osd crush rule dump replicated_ruleset { >>>> "rule_id": 0, >>>> "rule_name": "replicated_ruleset", >>>> "ruleset": 0, >>>> "type": 1, >>>> "min_size": 1, >>>> "max_size": 10, >>>> "steps": [ >>>> { >>>> "op": "take", >>>> "item": -10, >>>> "item_name": "default~hdd" >>>> }, >>>> { >>>> "op": "chooseleaf_firstn", >>>> "num": 0, >>>> "type": "host" >>>> }, >>>> { >>>> "op": "emit" >>>> } >>>> ] >>>> } >>>> >>>> [2] >>>> [@~]# for osd in `ceph pg dump pgs| grep '^17' | awk '{print $17" >>> "$19}' >>>> | grep -oE '[0-9]{1,2}'| sort -u -n`; do ceph osd crush >>>> | get-device-class >>>> osd.$osd ; done | sort -u >>>> dumped pgs >>>> hdd >> >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an >> email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx