Re: Best way to change bucket hierarchy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Its hard to tell without knowing what the diff is, but from your description I take it that you changed the failure domain for every(?) pool from host to chassis. I don't know what a chassis is in your architecture, but if each chassis contains several host buckets, then yes, I would expect almost every PG to be affected.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Kyriazis, George <george.kyriazis@xxxxxxxxx>
Sent: 05 June 2020 00:28:43
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Hmm,

So I tried all that, and I got almost all of my PGs being remapped.  Crush map looks correct.  Is that normal?

Thanks,

George


On Jun 4, 2020, at 2:33 PM, Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:

Hi George,

you don't need to worry about that too much. The EC profile contains two types of information, one part about the actual EC encoding and another part about crush parameters. Unfortunately, actually. Part of this information is mutable after pool creation while the rest is not. Mutable here means outside of the profile. You can change the failure domain in the crush map without issues, but the profile won't reflect that change. That's an inconsistency we currently have to live with and it would have been better to separate mutable data (like failure domain) from immutable data (like k and m) or provide a meaningful interface to maintain consistency of mutable information.

In short, don't believe everything the EC profile tells you. Some information might be out of date, like the failure domain or the device class (basically everything starting with crush-). If you remember that, you are out of trouble. Always dump the crush rule of an EC pool explicitly to see the true parameters in action.

Having said that, to change the failure domain for an EC pool, change the crush rule for the EC profile - I did this too and it works just fine. The crush rule has by default the same name as the pool. I'm afraid, here you will have to do a manual edit of the crush rule as Wido explained. There is no other way - at least currently not.

You can ask in this list for confirmation that your change is doing what you want.

Do not try to touch an EC profile, they are read-only any ways. The crush parameters are only used at pool creation and never looked at again. You can override these by editing the crush rule as explained above.

Best regards and good luck,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Kyriazis, George <george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx>>
Sent: 04 June 2020 20:56:38
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

Interesting info about the EC profile.  I do have an EC pool, but I noticed the following when I dumped the profile:

# ceph osd erasure-code-profile get ec22
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=2
plugin=jerasure
technique=reed_sol_van
w=8
#

Which says that the failure domain of the EC profile is also set to host.  Looks like I need to change the EC profile, too, but since it associated with the pool, maybe I can’t do that after pool creation?  Or…. Since it the property is named “crush-failure-domain”, it’s automatically inherited from the crush profile, so I don’t have to do anything?

Thanks,

George


On Jun 4, 2020, at 1:51 AM, Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx><mailto:frans@xxxxxx>> wrote:

Hi George,

for replicated rules you can simply create a new crush rule with the new failure domain set to chassis and change any pool's crush rule to this new one. If you have EC pools, then the chooseleaf needs to be edited by hand. I did this before as well. (A really unfortunate side effect is, that the EC profile attached to the pool goes out of sync with the crush map and there is nothing one can do about that. This is annoying yet harmless.)

The intend of doing these changes while norebalance is set is

- to avoid unnecessary data movement due to successive changes happening step by step and
- to make sure peering is successful before starting to move data.

I believe OSDs peer a bit faster with norebalance set and there is then a shorter interrupt to ongoing I/O (no I/O happens to a PG during peering).

Yes, if you safe the old crush map, you can undo everything. It is a good idea to have a backup also just for reference and to compare before and after.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Kyriazis, George <george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx><mailto:george.kyriazis@xxxxxxxxx>>
Sent: 04 June 2020 00:58:20
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

I don’t have too much experience editing crush rules, but I assume the chooseleaf step would also have to change to:

      step chooseleaf firstn 0 type chassis

Correct?  Is that the only other change that is needed?  It looks like the rule change can happen both inside and outside the “norebalance” setting (again with CLI commands), but is it safer to do it inside (ie. while not rebalancing)?

If I keep a backup of the crush rule map (with “ceph osd getcrushmap”), I assume I can restore the old map if something goes bad?

Thanks again!

George



On Jun 3, 2020, at 5:24 PM, Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx><mailto:frans@xxxxxx>> wrote:

You can use the command-line without editing the crush map. Look at the documentation of commands like

ceph osd crush add-bucket ...
ceph osd crush move ...

Before starting this, set "ceph osd set norebalance" and unset after you are happy with the crush tree. Let everything peer. You should see misplaced objects and remapped PGs, but no degraded objects or PGs.

Do this only when cluster is helth_ok, otherwise things can get really complicated.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Kyriazis, George <george.kyriazis@xxxxxxxxx<mailto:george.kyriazis@xxxxxxxxx><mailto:george.kyriazis@xxxxxxxxx>>
Sent: 03 June 2020 22:45:11
To: ceph-users
Subject:  Best way to change bucket hierarchy

Helo,

I have a live ceph cluster, and I’m in the need of modifying the bucket hierarchy.  I am currently using the default crush rule (ie. keep each replica on a different host).  My need is to add a “chassis” level, and keep replicas on a per-chassis level.

>From what I read in the documentation, I would have to edit the crush file manually, however this sounds kinda scary for a live cluster.

Are there any “best known methods” to achieve that goal without messing things up?

In my current scenario, I have one host per chassis, and planning on later adding nodes where there would be >1 hosts per chassis. It looks like “in theory” there wouldn’t be a need for any data movement after the crush map changes.  Will reality match theory? Anything else I need to watch out for?

Thank you!

George

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx><mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx><mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux