Re: No rebalance after ceph osd crush unlink

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

> I don't suggest editing the crush map, compiling, then re-injecting --
> I don't know what it will do in this case.

He he, too late :). It does the right thing. Rebalancing is almost finished. There is only one strange observation. 1 PG in a pool that is on the same hosts, but with a crush-rule that picks from a different and unaffected root also flipped 2 disks. I have no clue why, the rules apply to different sub-trees and I edited only one of them. Maybe there is a very subtle weight change?

Any ways, editing the crush map induces lots of sweating, but does the right thing if edited correctly (big if :). However, I always have a safety copy to roll back to.

Thanks for pointing me to the shadow tree!

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dvanders@xxxxxxxxx>
Sent: 18 May 2022 13:44:50
To: Frank Schilder
Cc: ceph-users
Subject: Re:  No rebalance after ceph osd crush unlink

Hi,

It's interesting that crushtool doesn't include the shadow tree -- I
am pretty sure that used to be included. I don't suggest editing the
crush map, compiling, then re-injecting -- I don't know what it will
do in this case.

What you could do instead is something like:
* ceph osd getcrushmap -o crush.map # backup the map
* ceph osd set norebalance # disable rebalancing while we experiment
* ceph osd crush reweight-all # see if this fixes the crush shadow tree

The unset norebalance if the crush tree looks good. Or if the crush
tree isn't what you expect, revert to your backup with `ceph osd
setcrushmap -i crush.map`.

-- dan



On Wed, May 18, 2022 at 12:47 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Dan,
>
> thanks for pointing me to this. Yes, it looks like a/the bug, the shadow tree is not changed although it should be updated as well. This is not even shown in the crush map I exported with getcrushmap. The option --show-shadow did the trick.
>
> Will `ceph osd crush reweight-all` actually remove these shadow leafs or just set the weight to 0? I need to link this host later again and I would like a solution as clean as possible. What would, for example, happen if I edit the crush map and execute setcrushmap? Will it recompile the correct crush map from the textual definition, or will these hanging leafs persist?
>
> Thanks!
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Dan van der Ster <dvanders@xxxxxxxxx>
> Sent: 18 May 2022 12:04:07
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re:  No rebalance after ceph osd crush unlink
>
> Hi Frank,
>
> Did you check the shadow tree (the one with tilde's in the name, seen
> with `ceph osd crush tree --show-shadow`)? Maybe the host was removed
> in the outer tree, but not the one used for device-type selection.
> There were bugs in this area before, e.g. https://tracker.ceph.com/issues/48065
> In those cases, the way to make the crush tree consistent again was
> `ceph osd crush reweight-all`.
>
> Cheers, Dan
>
>
>
> On Wed, May 18, 2022 at 11:51 AM Frank Schilder <frans@xxxxxx> wrote:
> >
> > Dear all,
> >
> > I have a strange problem. I have some hosts linked under an additional logical data center and needed to unlink two of the hosts. After unlinking the first host with
> >
> > ceph osd crush unlink ceph-18 MultiSite
> >
> > the crush map for this data center is updated correctly:
> >
> > datacenter MultiSite {
> >         id -148         # do not change unnecessarily
> >         id -149 class hdd               # do not change unnecessarily
> >         id -150 class ssd               # do not change unnecessarily
> >         id -236 class rbd_meta          # do not change unnecessarily
> >         id -200 class rbd_data          # do not change unnecessarily
> >         id -320 class rbd_perf          # do not change unnecessarily
> >         # weight 643.321
> >         alg straw2
> >         hash 0  # rjenkins1
> >         item ceph-04 weight 79.691
> >         item ceph-05 weight 81.474
> >         item ceph-06 weight 79.691
> >         item ceph-07 weight 79.691
> >         item ceph-19 weight 81.695
> >         item ceph-20 weight 81.695
> >         item ceph-21 weight 79.691
> >         item ceph-22 weight 79.691
> > }
> >
> > The host is gone. However, nothing happened. The pools with the crush rule
> >
> > rule ms-ssd {
> >         id 12
> >         type replicated
> >         min_size 1
> >         max_size 10
> >         step take MultiSite class rbd_data
> >         step chooseleaf firstn 0 type host
> >         step emit
> > }
> >
> > should now move data away from OSDs on this host, but nothing is happening. A pool with crush rule ms-ssd is:
> >
> > # ceph osd pool get sr-rbd-meta-one all
> > size: 3
> > min_size: 2
> > pg_num: 128
> > pgp_num: 128
> > crush_rule: ms-ssd
> > hashpspool: true
> > nodelete: true
> > nopgchange: false
> > nosizechange: false
> > write_fadvise_dontneed: false
> > noscrub: false
> > nodeep-scrub: false
> > use_gmt_hitset: 1
> > auid: 0
> > fast_read: 0
> >
> > However, its happily keeping data on the OSDs of host ceph-18. For example, one of the OSDs on this host has ID 1076. There are 4 PGs using this OSD:
> >
> > # ceph pg ls-by-pool sr-rbd-meta-one | grep 1076
> > 1.33     250        0         0       0 756156481        7834        125 3073 active+clean 2022-05-18 10:54:41.840097 757122'10112944  757122:84604327    [574,286,1076]p574    [574,286,1076]p574 2022-05-18 04:24:32.900261 2022-05-11 19:56:32.781889
> > 1.3d     259        0         0       0 796239360        3380         64 3006 active+clean 2022-05-18 10:54:41.749090 757122'24166942  757122:57010202 [1074,1076,1052]p1074 [1074,1076,1052]p1074 2022-05-18 06:16:35.605026 2022-05-16 19:37:56.829763
> > 1.4d     249        0         0       0 713678948        5690        105 3070 active+clean 2022-05-18 10:54:41.738918  757119'5861104  757122:45718157  [1072,262,1076]p1072  [1072,262,1076]p1072 2022-05-18 06:50:04.731194 2022-05-18 06:50:04.731194
> > 1.70     272        0         0       0 814317398        4591         76 3007 active+clean 2022-05-18 10:54:41.743604 757122'11849453  757122:72537747    [268,279,1076]p268    [268,279,1076]p268 2022-05-17 15:43:46.512941 2022-05-17 15:43:46.512941
> >
> > I don't understand why these are not remapped and rebalancing. Any ideas?
> >
> > Version is mimic latest.
> >
> > Thanks and best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux