Re: ceph osd pg-upmap-items not working

Kári Bertilsson <karibertils@xxxxxxxxx> · Mon, 18 Mar 2019 15:37:21 +0000

Because i have tested failing the mgr & rebooting all the servers in random order multiple times. The upmap optimizer did never find more optimizations to do after the initial optimizations. I tried leaving the balancer ON for days and also OFF and running manually several times.

i did manually move just a few pg's from the fullest disks to the lowest disks and free space increased by 7% so it was clearly not perfectly distributed.

I have now replaced 10 disks with larger ones, and after finshing syncing & then running the upmap balancer i am having similar results. The upmap optimizer did a few optimizations, but now says "Error EALREADY: Unable to find further optimization,or distribution is already perfect".

Looking at a snippet from "ceph osd df tree"... you can see it's not quite perfect. I am wondering if this could be because of the size difference between OSD's ? as i am running disks ranging from 1-10TB in the same host.

17   hdd   1.09200  1.00000 1.09TiB  741GiB  377GiB 66.27 1.09  13         osd.17   
 18   hdd   1.09200  1.00000 1.09TiB  747GiB  370GiB 66.86 1.10  13         osd.18   
 19   hdd   1.09200  1.00000 1.09TiB  572GiB  546GiB 51.20 0.84  10         osd.19   
 23   hdd   2.72899  1.00000 2.73TiB 1.70TiB 1.03TiB 62.21 1.02  31         osd.23   
 29   hdd   1.09200  1.00000 1.09TiB  627GiB  491GiB 56.11 0.92  11         osd.29   
 30   hdd   1.09200  1.00000 1.09TiB  574GiB  544GiB 51.34 0.84  10         osd.30   
 32   hdd   2.72899  1.00000 2.73TiB 1.73TiB 1023GiB 63.41 1.04  31         osd.32   
 43   hdd   2.72899  1.00000 2.73TiB 1.57TiB 1.16TiB 57.37 0.94  28         osd.43   
 45   hdd   2.72899  1.00000 2.73TiB 1.68TiB 1.05TiB 61.51 1.01  30         osd.45  

Config keys are as follows
"mgr/balancer/max_misplaced" = 1
"mgr/balancer/upmax_max_deviation" = 0.0001
"mgr/balancer/upmax_max_iterations" = 1000

Any ideas what could cause this ? Any info i can give to help diagnose ?

On Fri, Mar 15, 2019 at 3:48 PM David Turner <drakonstein@xxxxxxxxx> wrote:
Why do you think that it can't resolve this by itself?  You just said that the balancer was able to provide an optimization, but then that the distribution isn't perfect.  When there are no further optimizations, running `ceph balancer optimize plan` won't create a plan with any changes.  Possibly the active mgr needs a kick.  When my cluster isn't balancing when it's supposed to, I just run `ceph mgr fail {active mgr}` and within a minute or so the cluster is moving PGs around.

On Sat, Mar 9, 2019 at 8:05 PM Kári Bertilsson <karibertils@xxxxxxxxx> wrote:
Thanks

I did apply https://github.com/ceph/ceph/pull/26179.

Running manual upmap commands work now. I did run "ceph balancer optimize new"and It did add a few upmaps.

But now another issue. Distribution is far from perfect but the balancer can't find further optimization.
Specifically OSD 23 is getting way more pg's than the other 3tb OSD's.

See https://pastebin.com/f5g5Deak 

On Fri, Mar 1, 2019 at 10:25 AM <xie.xingguo@xxxxxxxxxx> wrote:
> Backports should be available in v12.2.11.
s/v12.2.11/ v12.2.12/
Sorry for the typo.

原始邮件
发件人：谢型果10072465
收件人：dan@xxxxxxxxxxxxxx <dan@xxxxxxxxxxxxxx>;
抄送人：ceph-users@xxxxxxxxxxxxxx <ceph-users@xxxxxxxxxxxxxx>;
日 期 ：2019年03月01日 17:09
主 题 ：Re: [ceph-users] ceph osd pg-upmap-items not working
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

See https://github.com/ceph/ceph/pull/26179
Backports should be available in v12.2.11.
Or you can manually do it by simply adopting https://github.com/ceph/ceph/pull/26127   if you are eager to get out of the trap right now.

发件人：DanvanderSter <dan@xxxxxxxxxxxxxx>
收件人：Kári Bertilsson <karibertils@xxxxxxxxx>;
抄送人：ceph-users <ceph-users@xxxxxxxxxxxxxx>;谢型果10072465;
日 期 ：2019年03月01日 14:48
主 题 ：Re:  ceph osd pg-upmap-items not working
It looks like that somewhat unusual crush rule is confusing the new
upmap cleaning.
(debug_mon 10 on the active mon should show those cleanups).

I'm copying Xie Xingguo, and probably you should create a tracker for this.

-- dan

On Fri, Mar 1, 2019 at 3:12 AM Kári Bertilsson <karibertils@xxxxxxxxx> wrote:
>
> This is the pool
> pool 41 'ec82_pool' erasure size 10 min_size 8 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 last_change 63794 lfor 21731/21731 flags hashpspool,ec_overwrites stripe_width 32768 application cephfs
>        removed_snaps [1~5]
>
> Here is the relevant crush rule:
> rule ec_pool { id 1 type erasure min_size 3 max_size 10 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default class hdd step choose indep 5 type host step choose indep 2 type osd step emit }
>
> Both OSD 23 and 123 are in the same host. So this change should be perfectly acceptable by the rule set.
> Something must be blocking the change, but i can't find anything about it in any logs.
>
> - Kári
>
> On Thu, Feb 28, 2019 at 8:07 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>>
>> Hi,
>>
>> pg-upmap-items became more strict in v12.2.11 when validating upmaps.
>> E.g., it now won't let you put two PGs in the same rack if the crush
>> rule doesn't allow it.
>>
>> Where are OSDs 23 and 123 in your cluster? What is the relevant crush rule?
>>
>> -- dan
>>
>>
>> On Wed, Feb 27, 2019 at 9:17 PM Kári Bertilsson <karibertils@xxxxxxxxx> wrote:
>> >
>> > Hello
>> >
>> > I am trying to diagnose why upmap stopped working where it was previously working fine.
>> >
>> > Trying to move pg 41.1 to 123 has no effect and seems to be ignored.
>> >
>> > # ceph osd pg-upmap-items 41.1 23 123
>> > set 41.1 pg_upmap_items mapping to [23->123]
>> >
>> > No rebalacing happens and if i run it again it shows the same output every time.
>> >
>> > I have in config
>> >         debug mgr = 4/5
>> >         debug mon = 4/5
>> >
>> > Paste from mon & mgr logs. Also output from "ceph osd dump"
>> > https://pastebin.com/9VrT4YcU
>> >
>> >
>> > I have run "ceph osd set-require-min-compat-client luminous" long time ago. And all servers running ceph have been rebooted numerous times since then.
>> > But somehow i am still seeing "min_compat_client jewel". I believe that upmap was previously working anyway with that "jewel" line present.
>> >
>> > I see no indication in any logs why the upmap commands are being ignored.
>> >
>> > Any suggestions on how to debug further or what could be the issue ?
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com