Re: ceph data not well distributed.

GuangYang <yguang11@xxxxxxxxxxx> · Wed, 15 Apr 2015 03:43:12 +0000

We have a tiny script which does the CRUSH re-weight based on the PGs/OSD to achieve balance across OSDs, and we run the script right after setup the cluster to avoid data migration after the cluster is filled up.

A couple of experiences to share:
 1> As suggested, it is helpful to choose a 2-powered PG number so that objects/PG is even (it is pretty even in our deployment, given the object size and disk size we have).
 2> With running the script, we try to achieve even PGs/OSD (for the data pool), so that the disk utilization is most likely to be even after the cluster is filled up.
 3> With disk replacement procedure (depending on the procedure you have), you may need some extra steps to make sure the CRUSH weight persist across disk replacement.

Sage has a built-in version (reweight-by-pg) for that, and here is our script - https://github.com/guangyy/ceph_misc/blob/master/osd_crush_reweight/ceph_osd_crush_reweight.pl

Hope that helps.

Thanks,
Guang

----------------------------------------
> To: ceph-users@xxxxxxxxxxxxxx
> From: pengyujian5201314@xxxxxxx
> Date: Wed, 15 Apr 2015 01:58:08 +0000
> Subject:  ceph data not well distributed.
>
> I have a ceph cluster with 125 osds with the same weight.
> But I found that data is not well distributed.
> df
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/sda1 47929224 2066208 43405264 5% /
> udev 16434372 4 16434368 1% /dev
> tmpfs 6578584 728 6577856 1% /run
> none 5120 0 5120 0% /run/lock
> none 16446460 0 16446460 0% /run/shm
> /dev/sda6 184307 62921 107767 37% /boot
> /dev/mapper/osd-104 877797376 354662904 523134472 41% /ceph-osd/osd-104
> /dev/mapper/osd-105 877797376 596911248 280886128 69% /ceph-osd/osd-105
> /dev/mapper/osd-106 877797376 497968080 379829296 57% /ceph-osd/osd-106
> /dev/mapper/osd-107 877797376 640225368 237572008 73% /ceph-osd/osd-107
> /dev/mapper/osd-108 877797376 509972412 367824964 59% /ceph-osd/osd-108
> /dev/mapper/osd-109 877797376 581435864 296361512 67% /ceph-osd/osd-109
> /dev/mapper/osd-110 877797376 724248740 153548636 83% /ceph-osd/osd-110
> /dev/mapper/osd-111 877797376 495883796 381913580 57% /ceph-osd/osd-111
> /dev/mapper/osd-112 877797376 488635912 389161464 56% /ceph-osd/osd-112
> /dev/mapper/osd-113 877797376 613807596 263989780 70% /ceph-osd/osd-113
> /dev/mapper/osd-114 877797376 633144408 244652968 73% /ceph-osd/osd-114
> /dev/mapper/osd-115 877797376 519702956 358094420 60% /ceph-osd/osd-115
> /dev/mapper/osd-116 877797376 449834752 427962624 52% /ceph-osd/osd-116
> /dev/mapper/osd-117 877797376 641484036 236313340 74% /ceph-osd/osd-117
> /dev/mapper/osd-118 877797376 519416488 358380888 60% /ceph-osd/osd-118
> /dev/mapper/osd-119 877797376 599926788 277870588 69% /ceph-osd/osd-119
> /dev/mapper/osd-120 877797376 460384476 417412900 53% /ceph-osd/osd-120
> /dev/mapper/osd-121 877797376 646286724 231510652 74% /ceph-osd/osd-121
> /dev/mapper/osd-122 877797376 647260752 230536624 74% /ceph-osd/osd-122
> /dev/mapper/osd-123 877797376 432367436 445429940 50% /ceph-osd/osd-123
> /dev/mapper/osd-124 877797376 595846772 281950604 68% /ceph-osd/osd-124
>
> The osd.104 is 41% full, but osd.110 is 83%.
> Can I move some pgs from osd.110 to osd.104 manually?
>
> Thanks!
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com