Hello
I previously enabled upmap and used automatic balancing with "ceph balancer on". I got very good results and OSD's ended up with perfectly distributed pg's.
Now after adding several new OSD's, auto balancing does not seem to be working anymore. OSD's have 30-50% usage where previously all had almost the same %.
I turned off auto balancer and tried manually running a plan
# ceph balancer reset
# ceph balancer optimize myplan
# ceph balancer show myplan
ceph osd pg-upmap-items 41.1 106 125 95 121 84 34 36 99 72 126
ceph osd pg-upmap-items 41.5 12 121 65 3 122 52 5 126
ceph osd pg-upmap-items 41.b 117 99 65 125
ceph osd pg-upmap-items 41.c 49 121 81 131
ceph osd pg-upmap-items 41.e 61 82 73 52 122 46 84 118
ceph osd pg-upmap-items 41.f 71 127 15 121 56 82
ceph osd pg-upmap-items 41.12 81 92
ceph osd pg-upmap-items 41.17 35 127 71 44
ceph osd pg-upmap-items 41.19 81 131 21 119 18 52
ceph osd pg-upmap-items 41.25 18 52 37 125 40 3 41 34 71 127 4 128
ceph osd pg-upmap-items 41.1 106 125 95 121 84 34 36 99 72 126
ceph osd pg-upmap-items 41.5 12 121 65 3 122 52 5 126
ceph osd pg-upmap-items 41.b 117 99 65 125
ceph osd pg-upmap-items 41.c 49 121 81 131
ceph osd pg-upmap-items 41.e 61 82 73 52 122 46 84 118
ceph osd pg-upmap-items 41.f 71 127 15 121 56 82
ceph osd pg-upmap-items 41.12 81 92
ceph osd pg-upmap-items 41.17 35 127 71 44
ceph osd pg-upmap-items 41.19 81 131 21 119 18 52
ceph osd pg-upmap-items 41.25 18 52 37 125 40 3 41 34 71 127 4 128
After running this plan there's no difference and still huge inbalance on the OSD's. Creating a new plan give the same plan again.
# ceph balancer eval
current cluster score 0.015162 (lower is better)
current cluster score 0.015162 (lower is better)
Balancer eval shows quite low number, so it seems to think the pg distribution is already optimized ?
Since i'm not getting this working again. I looked into the offline optimization at http://docs.ceph.com/docs/mimic/rados/operations/upmap/
I have 2 pools.
Replicated pool using 3 OSD's with "10k" device class.
And remaining OSD's have "hdd" device class.
The resulting out.txt creates a much larger plan, but would map alot of PG's to the "10k" OSD's (where they should not be). And i can't seem to find any way to exclude these 3 OSD's.
Any ideas how to proceed ?
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com