We have a tiny script which does the CRUSH re-weight based on the PGs/OSD to achieve balance across OSDs, and we run the script right after setup the cluster to avoid data migration after the cluster is filled up. A couple of experiences to share: 1> As suggested, it is helpful to choose a 2-powered PG number so that objects/PG is even (it is pretty even in our deployment, given the object size and disk size we have). 2> With running the script, we try to achieve even PGs/OSD (for the data pool), so that the disk utilization is most likely to be even after the cluster is filled up. 3> With disk replacement procedure (depending on the procedure you have), you may need some extra steps to make sure the CRUSH weight persist across disk replacement. Sage has a built-in version (reweight-by-pg) for that, and here is our script - https://github.com/guangyy/ceph_misc/blob/master/osd_crush_reweight/ceph_osd_crush_reweight.pl Hope that helps. Thanks, Guang ---------------------------------------- > To: ceph-users@xxxxxxxxxxxxxx > From: pengyujian5201314@xxxxxxx > Date: Wed, 15 Apr 2015 01:58:08 +0000 > Subject: ceph data not well distributed. > > I have a ceph cluster with 125 osds with the same weight. > But I found that data is not well distributed. > df > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sda1 47929224 2066208 43405264 5% / > udev 16434372 4 16434368 1% /dev > tmpfs 6578584 728 6577856 1% /run > none 5120 0 5120 0% /run/lock > none 16446460 0 16446460 0% /run/shm > /dev/sda6 184307 62921 107767 37% /boot > /dev/mapper/osd-104 877797376 354662904 523134472 41% /ceph-osd/osd-104 > /dev/mapper/osd-105 877797376 596911248 280886128 69% /ceph-osd/osd-105 > /dev/mapper/osd-106 877797376 497968080 379829296 57% /ceph-osd/osd-106 > /dev/mapper/osd-107 877797376 640225368 237572008 73% /ceph-osd/osd-107 > /dev/mapper/osd-108 877797376 509972412 367824964 59% /ceph-osd/osd-108 > /dev/mapper/osd-109 877797376 581435864 296361512 67% /ceph-osd/osd-109 > /dev/mapper/osd-110 877797376 724248740 153548636 83% /ceph-osd/osd-110 > /dev/mapper/osd-111 877797376 495883796 381913580 57% /ceph-osd/osd-111 > /dev/mapper/osd-112 877797376 488635912 389161464 56% /ceph-osd/osd-112 > /dev/mapper/osd-113 877797376 613807596 263989780 70% /ceph-osd/osd-113 > /dev/mapper/osd-114 877797376 633144408 244652968 73% /ceph-osd/osd-114 > /dev/mapper/osd-115 877797376 519702956 358094420 60% /ceph-osd/osd-115 > /dev/mapper/osd-116 877797376 449834752 427962624 52% /ceph-osd/osd-116 > /dev/mapper/osd-117 877797376 641484036 236313340 74% /ceph-osd/osd-117 > /dev/mapper/osd-118 877797376 519416488 358380888 60% /ceph-osd/osd-118 > /dev/mapper/osd-119 877797376 599926788 277870588 69% /ceph-osd/osd-119 > /dev/mapper/osd-120 877797376 460384476 417412900 53% /ceph-osd/osd-120 > /dev/mapper/osd-121 877797376 646286724 231510652 74% /ceph-osd/osd-121 > /dev/mapper/osd-122 877797376 647260752 230536624 74% /ceph-osd/osd-122 > /dev/mapper/osd-123 877797376 432367436 445429940 50% /ceph-osd/osd-123 > /dev/mapper/osd-124 877797376 595846772 281950604 68% /ceph-osd/osd-124 > > The osd.104 is 41% full, but osd.110 is 83%. > Can I move some pgs from osd.110 to osd.104 manually? > > Thanks! > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com