Re: EC pool only for hdd

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sat, 21 Dec 2024 09:57:40 -0500

Backfill proceeds in a make-before-break fashion to safeguard data, because Ceph is first and foremost about strong consistency.  Say you have a 3R (replicated, size=3) pool and you make a change that moves data around.

For a given PG, Ceph will complete a fourth copy of data before removing one of the original three that no longer is placed where it used to be.  If you time a pg query or pg dump just right you may see this empirically.

This might be what you observe.  At first during any change that moves data, lots of PGs and OSDs are updating in parallel, so this would be most pronounced early on.  As the cluster works through pending movement, there will be a longer tail where there’s less parallelism, and I would expect this phenomenon to be less pronounced.

— aad

> On Dec 20, 2024, at 4:46 AM, Eugen Block <eblock@xxxxxx> wrote:
> 
> Could you be a little more specific? Do you have any numbers/terminal outputs to show? In general, a higher usage is expected (temporarily) during backfill.
> 
> Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>:
> 
>> After a new rule has been set, is it normal that usage is growing
>> significantly while objects number stay pretty much the same?
>> 
>> Rok
>> 
>> On Mon, Dec 2, 2024 at 10:45 AM Eugen Block <eblock@xxxxxx> wrote:
>> 
>>> Yes, there will be a lot of data movement. But you can throttle
>>> backfill (are you on wpq instead of mclock?) and it will slowly drain
>>> the PGs from SSDs to HDDs to minimize client impact.
>>> 
>>> Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>:
>>> 
>>> > I didn't have any bad mappings.
>>> >
>>> > I'll wait until the backfill completes then try to apply new rules.
>>> >
>>> > Then I can probably expect some recovery will start so it can move
>>> > everything from ssd to hdd?
>>> >
>>> > On Sun, Dec 1, 2024 at 9:36 AM Eugen Block <eblock@xxxxxx> wrote:
>>> >
>>> >> It means that in each of the 1024 attempts, crush was able to find
>>> >> num-rep OSDs. Those are the OSD IDs in the brackets (like the acting
>>> >> set). You can then check the IDs (or at least some of them) for their
>>> >> device class, in case you have doubts (I always do that with a couple
>>> >> of random sets). But it looks good to me, I assume you didn’t have any
>>> >> bad mappings?
>>> >>
>>> >> Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>:
>>> >>
>>> >> > Thx.
>>> >> >
>>> >> > Can you explain mappings.txt a little bit?
>>> >> >
>>> >> > I assume that for every line in mappings.txt apply crush rule 1 for
>>> osds
>>> >> in
>>> >> > square brackets?
>>> >> >
>>> >> > Rok
>>> >> >
>>> >> > On Thu, Nov 28, 2024 at 8:53 AM Eugen Block <eblock@xxxxxx> wrote:
>>> >> >
>>> >> >> Of course it's possible. You can either change this rule by
>>> extracting
>>> >> >> the crushmap, decompiling it, editing the "take" section, compile it
>>> >> >> and inject it back into the cluster. Or you simply create a new rule
>>> >> >> with the class hdd specified and set this new rule for your pools. So
>>> >> >> the first approach would be:
>>> >> >>
>>> >> >> 1. ceph osd getcrushmap -o crushmap.bin
>>> >> >> 2. crushtool -d crushmap.bin -o crushmap.txt
>>> >> >> 3. open crushmap.txt with the editor of your choice, replace
>>> >> >>
>>> >> >>          step take default
>>> >> >> with:
>>> >> >>          step take default class hdd
>>> >> >>
>>> >> >> and save the file.
>>> >> >>
>>> >> >> 4. crushtool -c crushmap.txt -o crushmap.new
>>> >> >> 5. test it with crushtool:
>>> >> >>
>>> >> >> crushtool -i crushmap.new --test --rule 1 --num-rep 5
>>> --show-mappings |
>>> >> >> less
>>> >> >> crushtool -i crushmap.new --test --rule 1 --num-rep 5
>>> >> >> --show-bad-mappings | less
>>> >> >>
>>> >> >> You shouldn't have bad mappings if everything is okay. Inspect the
>>> >> >> result of --show-mappings to see if the OSDs match your HDD OSDs.
>>> >> >>
>>> >> >> 6. ceph osd setcrushmap -i crushmap.new
>>> >> >>
>>> >> >> ####
>>> >> >>
>>> >> >> Alternatively, create a new rule if your EC profile(s) already have
>>> >> >> the correct crush-device-class set. If not, you can create a new one,
>>> >> >> but keep in mind that you can't change the k and m values for a given
>>> >> >> pool, so you need to ensure that you use the same k and m values:
>>> >> >>
>>> >> >> ceph osd erasure-code-profile set ec-profile-k3m2 k=3 m=2
>>> >> >> crush-failure-domain=host crush-device-class=hdd
>>> >> >>
>>> >> >> ceph osd crush rule create-erasure rule-ec-k3m2 ec-profile-k3m2
>>> >> >>
>>> >> >> And here's the result:
>>> >> >>
>>> >> >> ceph osd crush rule dump rule-ec-k3m2 | grep -A2 take
>>> >> >>              "op": "take",
>>> >> >>              "item": -2,
>>> >> >>              "item_name": "default~hdd"
>>> >> >>
>>> >> >> Regards,
>>> >> >> Eugen
>>> >> >>
>>> >> >> Zitat von Rok Jaklič <rjaklic@xxxxxxxxx>:
>>> >> >>
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> > is it possible to set/change following already used rule to only
>>> use
>>> >> hdd?
>>> >> >> > {
>>> >> >> >     "rule_id": 1,
>>> >> >> >     "rule_name": "ec32",
>>> >> >> >     "type": 3,
>>> >> >> >     "steps": [
>>> >> >> >         {
>>> >> >> >             "op": "set_chooseleaf_tries",
>>> >> >> >             "num": 5
>>> >> >> >         },
>>> >> >> >         {
>>> >> >> >             "op": "set_choose_tries",
>>> >> >> >             "num": 100
>>> >> >> >         },
>>> >> >> >         {
>>> >> >> >             "op": "take",
>>> >> >> >             "item": -1,
>>> >> >> >             "item_name": "default"
>>> >> >> >         },
>>> >> >> >         {
>>> >> >> >             "op": "chooseleaf_indep",
>>> >> >> >             "num": 0,
>>> >> >> >             "type": "host"
>>> >> >> >         },
>>> >> >> >         {
>>> >> >> >             "op": "emit"
>>> >> >> >         }
>>> >> >> >     ]
>>> >> >> > }
>>> >> >> >
>>> >> >> > Kind regards,
>>> >> >> > Rok
>>> >> >> > _______________________________________________
>>> >> >> > ceph-users mailing list -- ceph-users@xxxxxxx
>>> >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> >> >>
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> ceph-users mailing list -- ceph-users@xxxxxxx
>>> >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> >> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> 
>>> 
>>> 
>>> 
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx