Re: ceph osd pg-upmap-items not working

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Fri, 1 Mar 2019 07:47:26 +0100

It looks like that somewhat unusual crush rule is confusing the new
upmap cleaning.
(debug_mon 10 on the active mon should show those cleanups).

I'm copying Xie Xingguo, and probably you should create a tracker for this.

-- dan

On Fri, Mar 1, 2019 at 3:12 AM Kári Bertilsson <karibertils@xxxxxxxxx> wrote:
>
> This is the pool
> pool 41 'ec82_pool' erasure size 10 min_size 8 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 last_change 63794 lfor 21731/21731 flags hashpspool,ec_overwrites stripe_width 32768 application cephfs
>        removed_snaps [1~5]
>
> Here is the relevant crush rule:
> rule ec_pool { id 1 type erasure min_size 3 max_size 10 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step choose indep 5 type host step choose indep 2 type osd step emit }
>
> Both OSD 23 and 123 are in the same host. So this change should be perfectly acceptable by the rule set.
> Something must be blocking the change, but i can't find anything about it in any logs.
>
> - Kári
>
> On Thu, Feb 28, 2019 at 8:07 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>>
>> Hi,
>>
>> pg-upmap-items became more strict in v12.2.11 when validating upmaps.
>> E.g., it now won't let you put two PGs in the same rack if the crush
>> rule doesn't allow it.
>>
>> Where are OSDs 23 and 123 in your cluster? What is the relevant crush rule?
>>
>> -- dan
>>
>>
>> On Wed, Feb 27, 2019 at 9:17 PM Kári Bertilsson <karibertils@xxxxxxxxx> wrote:
>> >
>> > Hello
>> >
>> > I am trying to diagnose why upmap stopped working where it was previously working fine.
>> >
>> > Trying to move pg 41.1 to 123 has no effect and seems to be ignored.
>> >
>> > # ceph osd pg-upmap-items 41.1 23 123
>> > set 41.1 pg_upmap_items mapping to [23->123]
>> >
>> > No rebalacing happens and if i run it again it shows the same output every time.
>> >
>> > I have in config
>> >         debug mgr = 4/5
>> >         debug mon = 4/5
>> >
>> > Paste from mon & mgr logs. Also output from "ceph osd dump"
>> > https://pastebin.com/9VrT4YcU
>> >
>> >
>> > I have run "ceph osd set-require-min-compat-client luminous" long time ago. And all servers running ceph have been rebooted numerous times since then.
>> > But somehow i am still seeing "min_compat_client jewel". I believe that upmap was previously working anyway with that "jewel" line present.
>> >
>> > I see no indication in any logs why the upmap commands are being ignored.
>> >
>> > Any suggestions on how to debug further or what could be the issue ?
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com