Hello, On Tue, 3 Jan 2017 13:08:50 +0200 Yair Magnezi wrote: > Hello cephers > We're running firefly ( 9.2.1 ) One of these two is wrong, you're either running Firefly (0.8.x, old and unsupported) or Infernalis (9.2.x, non-LTS and thus also unsupported). > I'm trying to re balance our cluster's osd and from some reason it looks > like the re balance is going the wrong way : A "ceph osd tree" would be helpful for starters. > What's i'm trying to do is to reduce the loads from osd-14 ( ceph osd > crush reweight osd.14 0.75 ) but what i see is the the backfill process is > moving pgs to osd-29 which is also 86% full > i wonder why the crash doesn't map to the less occupied osd-s ( 3 , 4 6 > for example ) > Any input is much appreciated . > CRUSH isn't particular deterministic from a human perspective and often data movements will involve steps that are not anticipated. CRUSH also does NOT know nor involve the utilization of OSDs, only their weight counts. If you're having extreme in-balances, RAISE the weight of the least utilized OSDs first (and in very small increments until you get a feeling for things). Do this in a manner to keep the weights of hosts more or less the same in the end. Christian > > > 2017-01-03 05:59:20.877705 7f3e6a0d6700 0 log_channel(cluster) log > [INF] : *2.2cb > starting backfill to osd.29 from* (0'0,0'0] MAX to 131306'8029954 > 2017-01-03 05:59:20.877841 7f3e670d0700 0 log_channel(cluster) log [INF] : > 2.30d starting backfill to osd.10 from (0'0,0'0] MAX to 131306'8721158 > 2017-01-03 05:59:31.374323 7f3e356b0700 0 -- 10.63.4.3:6826/3125306 >> > 10.63.4.5:6821/3162046 pipe(0x7f3e9d513000 sd=322 :6826 s=0 pgs=0 cs=0 l=0 > c=0x7f3ea72b5de0).accept connect_seq 1605 vs existing 1605 state standby > 2017-01-03 05:59:31.374440 7f3e356b0700 0 -- 10.63.4.3:6826/3125306 >> > 10.63.4.5:6821/3162046 pipe(0x7f3e9d513000 sd=322 :6826 s=0 pgs=0 cs=0 l=0 > c=0x7f3ea72b5de0).accept connect_seq 1606 vs existing 1605 state standby > ^C > root@ecprdbcph03-opens:/var/log/ceph# df -h > Filesystem Size Used Avail Use% Mounted on > udev 32G 4.0K 32G 1% /dev > tmpfs 6.3G 1.4M 6.3G 1% /run > /dev/dm-1 106G 4.1G 96G 5% / > none 4.0K 0 4.0K 0% /sys/fs/cgroup > none 5.0M 0 5.0M 0% /run/lock > none 32G 0 32G 0% /run/shm > none 100M 0 100M 0% /run/user > /dev/sdk2 465M 50M 391M 12% /boot > /dev/sdk1 512M 3.4M 509M 1% /boot/efi > ec-mapr-prd:/mapr/ec-mapr-prd/homes 262T 143T 119T 55% /export/home > /dev/sde1 889G 640G 250G 72% > /var/lib/ceph/osd/ceph-3 > /dev/sdf1 889G 656G 234G 74% > /var/lib/ceph/osd/ceph-4 > /dev/sdg1 889G 583G 307G 66% > /var/lib/ceph/osd/ceph-6 > /dev/sda1 889G 559G 331G 63% > /var/lib/ceph/osd/ceph-8 > /dev/sdb1 889G 651G 239G 74% > /var/lib/ceph/osd/ceph-10 > /dev/sdc1 889G 751G 139G 85% > /var/lib/ceph/osd/ceph-12 > /dev/sdh1 889G 759G 131G 86% > /var/lib/ceph/osd/ceph-14 > /dev/sdi1 889G 763G 127G 86% > /var/lib/ceph/osd/ceph-16 > /dev/sdj1 889G 732G 158G 83% > /var/lib/ceph/osd/ceph-18 > /dev/sdd1 889G 756G 134G 86% > /var/lib/ceph/osd/ceph-29 > root@ecprdbcph03-opens:/var/log/ceph# > > Thanks > > > > *Yair Magnezi * > > > > *Storage & Data Protection TL // Kenshoo* > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com