Hi Frank, thanks this 'root default' indeed looks different with these 0 there. I have also uploaded mine[1] because it looks very similar to Nico's. I guess his hdd pg's can also start moving in some occassions. Thanks for 'crushtool reclassify' hint, I guess I have missed this in the release notes or so. [1] https://pastebin.com/PFx0V3S7 -----Original Message----- To: Eugen Block Cc: Marc Roos; ceph-users Subject: Re: Re: hdd pg's migrating when converting ssd class osd's This is how my crush tree including shadow hierarchies looks like (a mess :): https://pastebin.com/iCLbi4Up Every device class has its own tree. Starting with mimic, this is automatic when creating new device classes. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Eugen Block <eblock@xxxxxx> Sent: 30 September 2020 08:43:47 To: Frank Schilder Cc: Marc Roos; ceph-users Subject: Re: Re: hdd pg's migrating when converting ssd class osd's Interesting, I also did this test on an upgraded cluster (L to N). I'll repeat the test on a native Nautilus to see it for myself. Zitat von Frank Schilder > Somebody on this list posted a script that can convert pre-mimic crush > trees with buckets for different types of devices to crush trees with > device classes with minimal data movement (trying to maintain IDs as > much as possible). Don't have a thread name right now, but could try > to find it tomorrow. > > I can check tomorrow how our crush tree unfolds. Basically, for every > device class there is a full copy (shadow hierarchy) for each device > class with its own weights etc. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Marc Roos > Sent: 29 September 2020 22:19:33 > To: eblock; Frank Schilder > Cc: ceph-users > Subject: RE: Re: hdd pg's migrating when converting ssd > class osd's > > Yes correct this is coming from Luminous or maybe even Kraken. How > does a default crush tree look like in mimic or octopus? Or is there > some manual how to bring this to the new 'default'? > > > -----Original Message----- > Cc: ceph-users > Subject: Re: Re: hdd pg's migrating when converting ssd > class osd's > > Are these crush maps inherited from pre-mimic versions? I have > re-balanced SSD and HDD pools in mimic (mimic deployed) where one > device class never influenced the placement of the other. I have mixed > hosts and went as far as introducing rbd_meta, rbd_data and such > classes to sub-divide even further (all these devices have different perf specs). > This worked like a charm. When adding devices of one class, only pools > in this class were ever affected. > > As far as I understand, starting with mimic, every shadow class > defines a separate tree (not just leafs/OSDs). Thus, device classes > are independent of each other. > > > > ________________________________________ > Sent: 29 September 2020 20:54:48 > To: eblock > Cc: ceph-users > Subject: Re: hdd pg's migrating when converting ssd class > osd's > > Yes correct, hosts have indeed both ssd's and hdd's combined. Is this > not more of a bug then? I would assume the goal of using device > classes is that you separate these and one does not affect the other, > even the host weight of the ssd and hdd class are already available. > The algorithm should just use that instead of the weight of the whole host. > Or is there some specific use case, where these classes combined is > required? > > > -----Original Message----- > Cc: ceph-users > Subject: *****SPAM***** Re: Re: hdd pg's migrating when > converting ssd class osd's > > They're still in the same root (default) and each host is member of > both device-classes, I guess you have a mixed setup (hosts c01/c02 > have both HDDs and SSDs)? I don't think this separation is enough to > avoid remapping even if a different device-class is affected (your > report confirms that). > > Dividing the crush tree into different subtrees might help here but > I'm not sure if that's really something you need. You might also just > deal with the remapping as long as it doesn't happen too often, I > guess. On the other hand, if your setup won't change (except adding > more OSDs) you might as well think about a different crush tree. It > really depends on your actual requirements. > > We created two different subtrees when we got new hardware and it > helped us a lot moving the data only once to the new hardware avoiding > multiple remappings, now the older hardware is our EC environment > except for some SSDs on those old hosts that had to stay in the main > subtree. So our setup is also very individual but it works quite nice. > :-) > > > Zitat von : > >> I have practically a default setup. If I do a 'ceph osd crush tree >> --show-shadow' I have a listing like this[1]. I would assume from the >> hosts being listed within the default~ssd and default~hdd, they are >> separate (enough)? >> >> >> [1] >> root default~ssd >> host c01~ssd >> .. >> .. >> host c02~ssd >> .. >> root default~hdd >> host c01~hdd >> .. >> host c02~hdd >> .. >> root default >> >> >> >> >> -----Original Message----- >> To: ceph-users@xxxxxxx >> Subject: Re: hdd pg's migrating when converting ssd >> class > >> osd's >> >> Are all the OSDs in the same crush root? I would think that since the >> crush weight of hosts change as soon as OSDs are out it impacts the >> whole crush tree. If you separate the SSDs from the HDDs logically > (e.g. >> different bucket type in the crush tree) the ramapping wouldn't >> affect > >> the HDDs. >> >> >> >> >>> I have been converting ssd's osd's to dmcrypt, and I have noticed >>> that >> >>> pg's of pools are migrated that should be (and are?) on hdd class. >>> >>> On a healthy ok cluster I am getting, when I set the crush reweight >>> to >> >>> 0.0 of a ssd osd this: >>> >>> 17.35 10415 0 0 9907 0 >>> 36001743890 0 0 3045 3045 >>> active+remapped+backfilling 2020-09-27 12:55:49.093054 >>> active+remapped+83758'20725398 >>> 83758:100379720 [8,14,23] 8 [3,14,23] 3 >>> 83636'20718129 2020-09-27 00:58:07.098096 83300'20689151 2020-09-24 >>> 21:42:07.385360 0 >>> >>> However osds 3,14,23,8 are all hdd osd's >>> >>> Since this is a cluster from Kraken/Luminous, I am not sure if the >>> device class of the replicated_ruleset[1] was set when the pool 17 >>> was >> >>> created. >>> Weird thing is that all pg's of this pool seem to be on hdd osd[2] >>> >>> Q. How can I display the definition of 'crush_rule 0' at the time of >>> the pool creation? (To be sure it had already this device class hdd >>> configured) >>> >>> >>> >>> [1] >>> [@~]# ceph osd pool ls detail | grep 'pool 17' >>> pool 17 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash >>> rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 83712 >>> flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd >>> >>> >>> [@~]# ceph osd crush rule dump replicated_ruleset { >>> "rule_id": 0, >>> "rule_name": "replicated_ruleset", >>> "ruleset": 0, >>> "type": 1, >>> "min_size": 1, >>> "max_size": 10, >>> "steps": [ >>> { >>> "op": "take", >>> "item": -10, >>> "item_name": "default~hdd" >>> }, >>> { >>> "op": "chooseleaf_firstn", >>> "num": 0, >>> "type": "host" >>> }, >>> { >>> "op": "emit" >>> } >>> ] >>> } >>> >>> [2] >>> [@~]# for osd in `ceph pg dump pgs| grep '^17' | awk '{print $17" >> "$19}' >>> | grep -oE '[0-9]{1,2}'| sort -u -n`; do ceph osd crush >>> | get-device-class >>> osd.$osd ; done | sort -u >>> dumped pgs >>> hdd > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx