Phillippe,
Maybe you can try the "crush-compat" balancer mode instead of
"upmap" until the new code is released.
David
On 12/11/19 9:36 PM, Philippe D'Anjou wrote:
Hi,
I see your code balanced my ssdpool to about 146 each.
I can confirm this did NOT happen.
The state it ended up in is:
0 ssd 3.49219 1.00000 3.5 TiB 797 GiB 757 GiB 36 GiB 3.9 GiB 2.7
TiB 22.30 0.31 147 up
1 ssd 3.49219 1.00000 3.5 TiB 803 GiB 751 GiB 49 GiB 3.7 GiB 2.7
TiB 22.47 0.31 146 up
2 ssd 3.49219 1.00000 3.5 TiB 818 GiB 764 GiB 51 GiB 3.8 GiB 2.7
TiB 22.89 0.32 150 up
3 ssd 3.49219 1.00000 3.5 TiB 794 GiB 757 GiB 34 GiB 3.2 GiB 2.7
TiB 22.21 0.31 146 up
4 ssd 3.49219 1.00000 3.5 TiB 837 GiB 798 GiB 34 GiB 4.4 GiB 2.7
TiB 23.39 0.32 156 up
6 ssd 3.49219 1.00000 3.5 TiB 790 GiB 751 GiB 35 GiB 3.6 GiB 2.7
TiB 22.09 0.31 146 up
8 ssd 3.49219 1.00000 3.5 TiB 874 GiB 831 GiB 40 GiB 3.5 GiB 2.6
TiB 24.44 0.34 156 up
10 ssd 3.49219 1.00000 3.5 TiB 807 GiB 761 GiB 43 GiB 3.4 GiB 2.7
TiB 22.58 0.31 146 up
5 ssd 3.49219 1.00000 3.5 TiB 744 GiB 708 GiB 32 GiB 4.2 GiB 2.8
TiB 20.81 0.29 141 up
7 ssd 3.49219 1.00000 3.5 TiB 732 GiB 690 GiB 39 GiB 3.2 GiB 2.8
TiB 20.48 0.28 136 up
9 ssd 3.49219 1.00000 3.5 TiB 702 GiB 657 GiB 42 GiB 3.9 GiB 2.8
TiB 19.64 0.27 131 up
11 ssd 3.49219 1.00000 3.5 TiB 805 GiB 781 GiB 22 GiB 2.3 GiB 2.7
TiB 22.50 0.31 138 up
101 ssd 3.49219 1.00000 3.5 TiB 835 GiB 793 GiB 38 GiB 3.7 GiB 2.7
TiB 23.36 0.32 146 up
103 ssd 3.49219 1.00000 3.5 TiB 846 GiB 803 GiB 40 GiB 3.3 GiB 2.7
TiB 23.67 0.33 150 up
105 ssd 3.49219 1.00000 3.5 TiB 800 GiB 762 GiB 36 GiB 2.5 GiB 2.7
TiB 22.38 0.31 148 up
107 ssd 3.49219 1.00000 3.5 TiB 843 GiB 790 GiB 49 GiB 3.4 GiB 2.7
TiB 23.58 0.33 147 up
100 ssd 3.49219 1.00000 3.5 TiB 804 GiB 753 GiB 48 GiB 2.6 GiB 2.7
TiB 22.47 0.31 144 up
102 ssd 3.49219 1.00000 3.5 TiB 752 GiB 737 GiB 13 GiB 2.4 GiB 2.8
TiB 21.02 0.29 141 up
104 ssd 3.49219 1.00000 3.5 TiB 805 GiB 771 GiB 31 GiB 2.8 GiB 2.7
TiB 22.50 0.31 144 up
106 ssd 3.49219 1.00000 3.5 TiB 793 GiB 724 GiB 66 GiB 2.9 GiB 2.7
TiB 22.17 0.31 143 up
108 ssd 3.49219 1.00000 3.5 TiB 816 GiB 778 GiB 36 GiB 2.7 GiB 2.7
TiB 22.83 0.32 156 up
109 ssd 3.49219 1.00000 3.5 TiB 811 GiB 763 GiB 45 GiB 2.8 GiB 2.7
TiB 22.68 0.31 146 up
110 ssd 3.49219 1.00000 3.5 TiB 863 GiB 832 GiB 28 GiB 2.5 GiB 2.6
TiB 24.13 0.33 154 up
111 ssd 3.49219 1.00000 3.5 TiB 784 GiB 737 GiB 45 GiB 2.7 GiB 2.7
TiB 21.92 0.30 146 up
It did not try to balance any further. Someone said he had the same issue.
I am pretty sure it will also not balance out the HDDs as neatly as
you got it there. There is definitely an issue somewhere, so far 3
people telling the same story. I never had this issue under Luminous
but im fighting with it since 4 months on 2 clusters. One got upgraded
to Nautilus and the other one (the one the pastes are from) is a fresh
14.2.4 one.
Any ideas on that?
Thanks
Am Donnerstag, 12. Dezember 2019, 02:09:33 OEZ hat David Zafman
<dzafman@xxxxxxxxxx> Folgendes geschrieben:
Philippe,
I have a master branch version of the code to test. The nautilus
backport https://github.com/ceph/ceph/pull/31956
<https://github.com/ceph/ceph/pull/31956 >should be the same.
Using your OSDMap, the code in master branch and some additional changes
to osdmaptool I was able to balance your cluster. The osdmaptool
changes simulate the mgr active balancer behavior. It never took no
more than 0.13991 seconds to calculate more upmaps per round. And that's
on a virtual machine used for development. It took 35 rounds with 10
maximum upmaps per crush rule set of pools per round. With the default
1 minute sleeps inside the mgr it would take 35 minutes. Obviously,
recovery/backfill has to finish before the cluster settles into the new
configuration. It needed 397 additional upmaps and removed 8.
Because all pools for a given crush rule are balanced together you can
see that this is more balanced than Rich's configuration uising Luminous.
This balancer code is subject to change before final release of the next
Nautilus point release.
Final layout:
osd.0 pgs 146
osd.1 pgs 146
osd.2 pgs 146
osd.3 pgs 146
osd.4 pgs 146
osd.5 pgs 146
osd.6 pgs 146
osd.7 pgs 146
osd.8 pgs 146
osd.9 pgs 146
osd.10 pgs 146
osd.11 pgs 146
osd.12 pgs 74
osd.13 pgs 74
osd.14 pgs 73
osd.15 pgs 74
osd.16 pgs 74
osd.17 pgs 74
osd.18 pgs 73
osd.19 pgs 74
osd.20 pgs 73
osd.21 pgs 73
osd.22 pgs 74
osd.23 pgs 73
osd.24 pgs 73
osd.25 pgs 75
osd.26 pgs 74
osd.27 pgs 74
osd.28 pgs 73
osd.29 pgs 73
osd.30 pgs 73
osd.31 pgs 73
osd.32 pgs 74
osd.33 pgs 73
osd.34 pgs 73
osd.35 pgs 74
osd.36 pgs 74
osd.37 pgs 74
osd.38 pgs 74
osd.39 pgs 74
osd.40 pgs 73
osd.41 pgs 73
osd.42 pgs 73
osd.43 pgs 73
osd.44 pgs 74
osd.45 pgs 73
osd.46 pgs 73
osd.47 pgs 73
osd.48 pgs 73
osd.49 pgs 73
osd.50 pgs 73
osd.51 pgs 73
osd.52 pgs 75
osd.53 pgs 59
osd.54 pgs 74
osd.55 pgs 74
osd.56 pgs 74
osd.57 pgs 73
osd.58 pgs 74
osd.59 pgs 74
osd.60 pgs 74
osd.61 pgs 74
osd.62 pgs 73
osd.63 pgs 74
osd.64 pgs 73
osd.65 pgs 74
osd.66 pgs 74
osd.67 pgs 74
osd.68 pgs 73
osd.69 pgs 74
osd.70 pgs 73
osd.71 pgs 73
osd.72 pgs 73
osd.73 pgs 73
osd.74 pgs 73
osd.75 pgs 73
osd.76 pgs 73
osd.77 pgs 73
osd.78 pgs 73
osd.79 pgs 73
osd.80 pgs 73
osd.81 pgs 73
osd.82 pgs 73
osd.83 pgs 73
osd.84 pgs 73
osd.85 pgs 73
osd.86 pgs 73
osd.87 pgs 73
osd.88 pgs 73
osd.89 pgs 73
osd.90 pgs 73
osd.91 pgs 73
osd.92 pgs 73
osd.93 pgs 73
osd.94 pgs 73
osd.95 pgs 73
osd.96 pgs 73
osd.97 pgs 73
osd.98 pgs 73
osd.99 pgs 73
osd.100 pgs 146
osd.101 pgs 146
osd.102 pgs 146
osd.103 pgs 146
osd.104 pgs 146
osd.105 pgs 146
osd.106 pgs 146
osd.107 pgs 146
osd.108 pgs 146
osd.109 pgs 146
osd.110 pgs 146
osd.111 pgs 146
osd.112 pgs 73
osd.113 pgs 73
osd.114 pgs 73
osd.115 pgs 73
osd.116 pgs 73
osd.117 pgs 73
osd.118 pgs 73
osd.119 pgs 73
osd.120 pgs 73
osd.121 pgs 73
osd.122 pgs 73
osd.123 pgs 73
osd.124 pgs 73
osd.125 pgs 73
osd.126 pgs 73
osd.127 pgs 74
osd.128 pgs 73
osd.129 pgs 73
osd.130 pgs 73
osd.131 pgs 73
osd.132 pgs 73
osd.133 pgs 73
osd.134 pgs 73
osd.135 pgs 73
David
On 12/10/19 9:59 PM, Philippe D'Anjou wrote:
> Given I was told its an issue of too low PGs I am raising and testing
> this, although my SSDs which have about 150 each also are not well
> distributed.
> I attached my OSDmap, I'd appreciate if you could run your test on it
> like you did with the other guy, so I know if this will ever
> distribute equally or not..
>
> If you're busy I understand that too, then ignore this.
>
> Thanks in either case. Just have been dealing with this since months
> now and it gets frustrating.
>
> best regards
>
> Am Dienstag, 10. Dezember 2019, 03:53:17 OEZ hat David Zafman
> <dzafman@xxxxxxxxxx <mailto:dzafman@xxxxxxxxxx>> Folgendes geschrieben:
>
>
>
> Please file a tracker with the symptom and examples. Please attach your
> OSDMap (ceph osd getmap > osdmap.bin).
>
> Note that https://github.com/ceph/ceph/pull/31956
<https://github.com/ceph/ceph/pull/31956 >
> <https://github.com/ceph/ceph/pull/31956
<https://github.com/ceph/ceph/pull/31956 >>has the Nautilus
> version of improved upmap code. It also changes osdmaptool to match the
> mgr behavior, so that one can observe the behavior of the upmap balancer
> offline.
>
> Thanks
>
> David
>
> On 12/8/19 11:04 AM, Philippe D'Anjou wrote:
> > It's only getting worse after raising PGs now.
> >
> > Anything between:
> > 96 hdd 9.09470 1.00000 9.1 TiB 4.9 TiB 4.9 TiB 97 KiB 13 GiB 4.2
> > TiB 53.62 0.76 54 up
> >
> > and
> >
> > 89 hdd 9.09470 1.00000 9.1 TiB 8.1 TiB 8.1 TiB 88 KiB 21 GiB 1001
> > GiB 89.25 1.27 87 up
> >
> > How is that possible? I dont know how much more proof I need to
> > present that there's a bug.
>
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
<mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx