We've encountered this problem a lot. As far as I know the best practice should be making the distribution of PG across OSDs as even as you can after you create the pool and before you write any data.
1. the disk utilization = (PGs per OSD) * (files per PG). Ceph is good at making (files per PG) almost even, but not PGs per OSD. So you need to adjust PG nums of OSD, you can use reweight-by-*(pg, utilization) commands or write your own tools to adjust osd weight to achieve a result you are satisfied with.
2. Reweight osds with data already filled can cause a significant data movement, which you should try to avoid. So do it before putting data in.
Thanks
Leidong
On Thursday, November 20, 2014 5:17 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
I think these numbers are about what is expected. You could try a couple things to improve it, but neither of them are common:
1) increase the number of PGs (and pgp_num) a lot more. I you decide to experiment with this, watch your CPU and memory numbers carefully.
2) try to correct for the inequities manually by futzing with either the crush weights or the override weights (which rewritten-by-utilization does, although by default it doesn't try and correct imbalances under 15%).
But you're getting pretty close to full so I'd really recommend just expanding your storage cluster and swallowing the lost margins.
-Greg
1) increase the number of PGs (and pgp_num) a lot more. I you decide to experiment with this, watch your CPU and memory numbers carefully.
2) try to correct for the inequities manually by futzing with either the crush weights or the override weights (which rewritten-by-utilization does, although by default it doesn't try and correct imbalances under 15%).
But you're getting pretty close to full so I'd really recommend just expanding your storage cluster and swallowing the lost margins.
-Greg
On Wed, Nov 19, 2014 at 8:32 AM Stephane Boisvert <stephane.boisvert@xxxxxxxxxxxx> wrote:
Hi,
I know a lot of people already asked these questions but looking at
all the answers I'm still having problems with my OSD balancing. Disks
size is 4TB drives. We are seeing differences up to 10%
Here is a DF from one server:
/dev/sdc1 3.7T 3.0T 680G 82% /var/lib/ceph/osd/ceph-190
/dev/sdd1 3.7T 3.0T 707G 81% /var/lib/ceph/osd/ceph-191
/dev/sde1 3.7T 3.2T 466G 88% /var/lib/ceph/osd/ceph-192
/dev/sdf1 3.7T 3.1T 566G 85% /var/lib/ceph/osd/ceph-193
/dev/sdg1 3.7T 3.2T 531G 86% /var/lib/ceph/osd/ceph-194
/dev/sdh1 3.7T 3.3T 371G 91% /var/lib/ceph/osd/ceph-195
/dev/sdi1 3.7T 3.0T 707G 82% /var/lib/ceph/osd/ceph-196
/dev/sdj1 3.7T 3.3T 358G 91% /var/lib/ceph/osd/ceph-197
/dev/sdk1 3.7T 3.0T 662G 83% /var/lib/ceph/osd/ceph-198
/dev/sdl1 3.7T 3.0T 669G 83% /var/lib/ceph/osd/ceph-199
/dev/sdm1 3.7T 3.5T 186G 96% /var/lib/ceph/osd/ceph-200
/dev/sdb1 3.7T 3.1T 586G 85% /var/lib/ceph/osd/ceph-152
My crush tunables is set top optimal
Here is m my pool info
pool 52 'poolname' replicated size 2 min_size 1 crush_ruleset 2
object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 162677 flags
hashpspool stripe_width 0
Here is the crushmap for that specific pool
-10 130.5 pool poolname
-8 43.56 host serv007
144 3.63 osd.144 up 1
145 3.63 osd.145 up 1
146 3.63 osd.146 up 1
147 3.63 osd.147 up 1
148 3.63 osd.148 up 1
149 3.63 osd.149 up 1
150 3.63 osd.150 up 1
151 3.63 osd.151 up 1
153 3.63 osd.153 up 1
154 3.63 osd.154 up 1
155 3.63 osd.155 up 1
202 3.63 osd.202 up 1
-9 43.56 host serv008
156 3.63 osd.156 up 1
157 3.63 osd.157 up 1
158 3.63 osd.158 up 1
159 3.63 osd.159 up 1
160 3.63 osd.160 up 1
161 3.63 osd.161 up 1
162 3.63 osd.162 up 1
163 3.63 osd.163 up 1
164 3.63 osd.164 up 1
165 3.63 osd.165 up 1
166 3.63 osd.166 up 1
167 3.63 osd.167 up 1
-15 43.43 host serv012
193 3.63 osd.193 up 1
197 3.63 osd.197 up 1
192 3.63 osd.192 up 1
198 3.63 osd.198 up 1
200 3.5 osd.200 up 1
196 3.63 osd.196 up 1
191 3.63 osd.191 up 1
199 3.63 osd.199 up 1
190 3.63 osd.190 up 1
195 3.63 osd.195 up 1
194 3.63 osd.194 up 1
152 3.63 osd.152 up 1
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com