Re: hdd pg's migrating when converting ssd class osd's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Interesting, I also did this test on an upgraded cluster (L to N). I'll repeat the test on a native Nautilus to see it for myself.


Zitat von Frank Schilder <frans@xxxxxx>:

Somebody on this list posted a script that can convert pre-mimic crush trees with buckets for different types of devices to crush trees with device classes with minimal data movement (trying to maintain IDs as much as possible). Don't have a thread name right now, but could try to find it tomorrow.

I can check tomorrow how our crush tree unfolds. Basically, for every device class there is a full copy (shadow hierarchy) for each device class with its own weights etc.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
Sent: 29 September 2020 22:19:33
To: eblock; Frank Schilder
Cc: ceph-users
Subject: RE: Re: hdd pg's migrating when converting ssd class osd's

Yes correct this is coming from Luminous or maybe even Kraken. How does
a default crush tree look like in mimic or octopus? Or is there some
manual how to bring this to the new 'default'?


-----Original Message-----
Cc: ceph-users
Subject: Re:  Re: hdd pg's migrating when converting ssd
class osd's

Are these crush maps inherited from pre-mimic versions? I have
re-balanced SSD and HDD pools in mimic (mimic deployed) where one device
class never influenced the placement of the other. I have mixed hosts
and went as far as introducing rbd_meta, rbd_data and such classes to
sub-divide even further (all these devices have different perf specs).
This worked like a charm. When adding devices of one class, only pools
in this class were ever affected.

As far as I understand, starting with mimic, every shadow class defines
a separate tree (not just leafs/OSDs). Thus, device classes are
independent of each other.



________________________________________
Sent: 29 September 2020 20:54:48
To: eblock
Cc: ceph-users
Subject:  Re: hdd pg's migrating when converting ssd class
osd's

Yes correct, hosts have indeed both ssd's and hdd's combined. Is this
not more of a bug then? I would assume the goal of using device classes
is that you separate these and one does not affect the other, even the
host weight of the ssd and hdd class are already available. The
algorithm should just use that instead of the weight of the whole host.
Or is there some specific use case, where these classes combined is
required?


-----Original Message-----
Cc: ceph-users
Subject: *****SPAM***** Re:  Re: hdd pg's migrating when
converting ssd class osd's

They're still in the same root (default) and each host is member of both
device-classes, I guess you have a mixed setup (hosts c01/c02 have both
HDDs and SSDs)? I don't think this separation is enough to avoid
remapping even if a different device-class is affected (your report
confirms that).

Dividing the crush tree into different subtrees might help here but I'm
not sure if that's really something you need. You might also just deal
with the remapping as long as it doesn't happen too often, I guess. On
the other hand, if your setup won't change (except adding more OSDs) you
might as well think about a different crush tree. It really depends on
your actual requirements.

We created two different subtrees when we got new hardware and it helped
us a lot moving the data only once to the new hardware avoiding multiple
remappings, now the older hardware is our EC environment except for some
SSDs on those old hosts that had to stay in the main subtree. So our
setup is also very individual but it works quite nice.
:-)


Zitat von :

I have practically a default setup. If I do a 'ceph osd crush tree
--show-shadow' I have a listing like this[1]. I would assume from the
hosts being listed within the default~ssd and default~hdd, they are
separate (enough)?


[1]
root default~ssd
     host c01~ssd
..
..
     host c02~ssd
..
root default~hdd
     host c01~hdd
..
     host c02~hdd
..
root default




-----Original Message-----
To: ceph-users@xxxxxxx
Subject:  Re: hdd pg's migrating when converting ssd class

osd's

Are all the OSDs in the same crush root? I would think that since the
crush weight of hosts change as soon as OSDs are out it impacts the
whole crush tree. If you separate the SSDs from the HDDs logically
(e.g.
different bucket type in the crush tree) the ramapping wouldn't affect

the HDDs.




I have been converting ssd's osd's to dmcrypt, and I have noticed
that

pg's of pools are migrated that should be (and are?) on hdd class.

On a healthy ok cluster I am getting, when I set the crush reweight
to

0.0 of a ssd osd this:

17.35     10415                  0        0      9907       0
36001743890           0          0 3045     3045
active+remapped+backfilling 2020-09-27 12:55:49.093054
active+remapped+83758'20725398
83758:100379720  [8,14,23]          8  [3,14,23]              3
83636'20718129 2020-09-27 00:58:07.098096  83300'20689151 2020-09-24
21:42:07.385360             0

However osds 3,14,23,8 are all hdd osd's

Since this is a cluster from Kraken/Luminous, I am not sure if the
device class of the replicated_ruleset[1] was set when the pool 17
was

created.
Weird thing is that all pg's of this pool seem to be on hdd osd[2]

Q. How can I display the definition of 'crush_rule 0' at the time of
the pool creation? (To be sure it had already this device class hdd
configured)



[1]
[@~]# ceph osd pool ls detail | grep 'pool 17'
pool 17 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 83712
flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd


[@~]# ceph osd crush rule dump replicated_ruleset {
    "rule_id": 0,
    "rule_name": "replicated_ruleset",
    "ruleset": 0,
    "type": 1,
    "min_size": 1,
    "max_size": 10,
    "steps": [
        {
            "op": "take",
            "item": -10,
            "item_name": "default~hdd"
        },
        {
            "op": "chooseleaf_firstn",
            "num": 0,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}

[2]
[@~]# for osd in `ceph pg dump pgs| grep '^17' | awk '{print $17"
"$19}'
| grep -oE '[0-9]{1,2}'| sort -u -n`; do ceph osd crush
| get-device-class
osd.$osd ; done | sort -u
dumped pgs
hdd



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux