Hello, On Tue, 3 Jan 2017 15:47:09 +0200 Yair Magnezi wrote: > Hello > > 1) Does the re-weigh / load balancing is taking place only within the > same node ? Not in general, but it could certainly happen if the change is small and involves only a single PG for examples. > 2) I'm raising the target osd weigh but nothing is happening , i expect to > see some data movements but nothing is there , only when decreasing the > weigh i can see back-filling is taking place , is this normal ? > Your log snippets are from OSD logs, where these backfill operations are not logged. Do a "watch ceph -s" in a separate window during these ops and/or log at the "ceph.log" on a MON node. That said, if the change in weight is very small, nothing might happen. Christian > > ceph osd tree | grep 50 > 50 0.89999 osd.50 up 1.00000 1.00000 > > > root@ecprdbcph04-opens:/var/log/ceph# ceph osd df | grep 50 > 14 0.75000 1.00000 888G 694G 194G 78.10 0.99 119 > 36 0.86800 1.00000 888G 608G 279G 68.50 0.87 146 > *50 0.84999 1.00000 888G 520G 368G 58.51 0.74 122* > 52 0.86800 1.00000 888G 650G 238G 73.16 0.93 144 > 37 0.86800 1.00000 888G 650G 238G 73.19 0.93 134 > > root@ecprdbcph04-opens:/var/log/ceph# ceph osd crush reweight osd.50 0.90 > reweighted item id 50 name 'osd.50' to 0.9 in crush map > > > 2017-01-03 08:32:39.532287 7f1a42319700 0 -- 10.63.4.18:6838/84978 >> > 10.63.4.18:6814/81943 pipe(0x7f1a976fa000 sd=382 :6838 s=0 pgs=0 cs=0 l=0 > c=0x7f1a9ef04840).accept connect_seq 12 vs existing 12 state standby > 2017-01-03 08:32:39.532353 7f1a42319700 0 -- 10.63.4.18:6838/84978 >> > 10.63.4.18:6814/81943 pipe(0x7f1a976fa000 sd=382 :6838 s=0 pgs=0 cs=0 l=0 > c=0x7f1a9ef04840).accept connect_seq 13 vs existing 12 state standby > 2017-01-03 08:32:39.573405 7f1a3cd25700 0 -- 10.63.4.18:6838/84978 >> > 10.63.4.18:6842/85573 pipe(0x7f1a9606f000 sd=475 :6838 s=0 pgs=0 cs=0 l=0 > c=0x7f1a9ef06d60).accept connect_seq 11 vs existing 11 state standby > 2017-01-03 08:32:39.573457 7f1a3cd25700 0 -- 10.63.4.18:6838/84978 >> > 10.63.4.18:6842/85573 pipe(0x7f1a9606f000 sd=475 :6838 s=0 pgs=0 cs=0 l=0 > c=0x7f1a9ef06d60).accept connect_seq 12 vs existing 11 state standby > > Thanks > > > > *Yair Magnezi * > > > > > *Storage & Data Protection TL // KenshooOffice +972 7 32862423 // > Mobile +972 50 575-2955__________________________________________* > > > > On Tue, Jan 3, 2017 at 3:17 PM, Christian Balzer <chibi@xxxxxxx> wrote: > > > > > Hello, > > > > On Tue, 3 Jan 2017 14:57:16 +0200 Yair Magnezi wrote: > > > > > Hello Christian . > > > Sorry for my mistake it's Infernalis we're running ( 9.2.1 ) > > > > > With docs being down I'm not certain, but that isn't the latest Infernalis > > AFAIR. > > But before any upgrades, you want that cluster being stable and healthy. > > > > > our tree looks like this --> > > > > > Thanks, so 6 nodes, no corner cases here then. > > > > "ceph osd df" as well, but I assume from the original mail that all your > > OSDS are the same size. > > > > [snip] > > > > > we have an ongoing capacity issue as you can see below ( although we're > > > only using less then 80% ) > > > > > That's getting pretty close to the limites (with the default values), as > > Ceph really isn't very good at keeping things balanced. > > > > > > > > root@ecprdbcph01-opens:/var/lib/ceph/osd/ceph-11/current# ceph df > > > GLOBAL: > > > SIZE AVAIL RAW USED %RAW USED > > > 53329G 11219G 42110G 78.96 > > > > > > > > > osd.12 is near full at 85% > > > osd.16 is near full at 85% > > > osd.17 is near full at 87% > > > osd.19 is near full at 85% > > > osd.22 is near full at 87% > > > osd.24 is near full at 87% > > > osd.29 is near full at 85% > > > osd.33 is near full at 86% > > > osd.39 is near full at 85% > > > osd.42 is near full at 87% > > > osd.45 is near full at 87% > > > osd.47 is near full at 87% > > > osd.49 is near full at 88% > > > osd.58 is near full at 87% > > > > > > > > At this number of near-full OSDs I'd strongly recommend adding more > > OSDs/nodes, because even with a perfectly balanced cluster you'd still be > > in trouble if a node or even a single OSD were to fail. > > > > > > > > i'm trying to decrease the weigh as you've suggested but it looks like we > > > have some troubles : > > > > > I wrote "RAISE" as in "increase" the weight of OSDs that have significantly > > less data than others. > > > > > ceph osd crush reweight osd.11 0.98 > > > > > > tail -f ceph-osd.11.log > > > > > > > > > 2017-01-03 07:38:41.952538 7f9a5c7e1700 0 -- 10.63.4.1:6808/3301342 >> > > > 10.63.4.19:6827/2264381 pipe(0x7f9ad3df4000 sd=442 :6808 s=0 pgs=0 cs=0 > > l=0 > > > c=0x7f9ac2530000).accept connect_seq 34 vs existing 33 state standby > > > 2017-01-03 07:41:46.566313 7f9a73871700 0 -- 10.63.4.1:6808/3301342 >> > > > 10.63.4.1:6830/3303583 pipe(0x7f9ac80d5000 sd=376 :6808 s=0 pgs=0 cs=0 > > l=0 > > > c=0x7f9ac2530160).accept connect_seq 4 vs existing 4 state standby > > > 2017-01-03 07:41:46.566370 7f9a73871700 0 -- 10.63.4.1:6808/3301342 >> > > > 10.63.4.1:6830/3303583 pipe(0x7f9ac80d5000 sd=376 :6808 s=0 pgs=0 cs=0 > > l=0 > > > c=0x7f9ac2530160).accept connect_seq 5 vs existing 4 state standby > > > 2017-01-03 07:41:46.585562 7f9a631d9700 0 -- 10.63.4.1:6808/3301342 >> > > > 10.63.4.1:6824/3303035 pipe(0x7f9ab9940000 sd=283 :6808 s=0 pgs=0 cs=0 > > l=0 > > > c=0x7f9ac2532ec0).accept connect_seq 5 vs existing 5 state standby > > > 2017-01-03 07:41:46.585608 7f9a631d9700 0 -- 10.63.4.1:6808/3301342 >> > > > 10.63.4.1:6824/3303035 pipe(0x7f9ab9940000 sd=283 :6808 s=0 pgs=0 cs=0 > > l=0 > > > c=0x7f9ac2532ec0).accept connect_seq 6 vs existing 5 state standby > > > > > > in general i've also tried to use reweight-by-utilization but it > > > doesn't seem to work so well > > > > > Latest Hammer or Jewel supposedly have a much improved one. > > > > > > > > Is there any known bug with our version ? will a restart of the osds > > > solve this issue ( it was menstioned in one of the forum's threads but it > > > was related to firefly ) > > > > > > > See above about versions, restart shouldn't be needed but then again > > recent experiences do suggest that the "Windows approach" (turning it > > off and on again) seems to help with Ceph at times, too. > > > > Christian > > > > > Many Thanks . > > > > > > > > > > > > > > > > > > > > > > > > *Yair Magnezi * > > > > > > > > > > > > > > > *Storage & Data Protection TL // KenshooOffice +972 7 32862423 // > > > Mobile +972 50 575-2955__________________________________________* > > > > > > > > > > > > On Tue, Jan 3, 2017 at 1:41 PM, Christian Balzer <chibi@xxxxxxx> wrote: > > > > > > > > > > > Hello, > > > > > > > > On Tue, 3 Jan 2017 13:08:50 +0200 Yair Magnezi wrote: > > > > > > > > > Hello cephers > > > > > We're running firefly ( 9.2.1 ) > > > > > > > > One of these two is wrong, you're either running Firefly (0.8.x, old > > and > > > > unsupported) or Infernalis (9.2.x, non-LTS and thus also unsupported). > > > > > > > > > > > > > I'm trying to re balance our cluster's osd and from some reason it > > looks > > > > > like the re balance is going the wrong way : > > > > > > > > A "ceph osd tree" would be helpful for starters. > > > > > > > > > What's i'm trying to do is to reduce the loads from osd-14 ( ceph > > osd > > > > > crush reweight osd.14 0.75 ) but what i see is the the backfill > > process > > > > is > > > > > moving pgs to osd-29 which is also 86% full > > > > > i wonder why the crash doesn't map to the less occupied osd-s ( 3 > > , > > > > 4 6 > > > > > for example ) > > > > > Any input is much appreciated . > > > > > > > > > > > > > CRUSH isn't particular deterministic from a human perspective and often > > > > data movements will involve steps that are not anticipated. > > > > CRUSH also does NOT know nor involve the utilization of OSDs, only > > their > > > > weight counts. > > > > > > > > If you're having extreme in-balances, RAISE the weight of the least > > > > utilized OSDs first (and in very small increments until you get a > > > > feeling for things). > > > > Do this in a manner to keep the weights of hosts more or less the same > > > > in the end. > > > > > > > > Christian > > > > > > > > > > > > > > > > > > > 2017-01-03 05:59:20.877705 7f3e6a0d6700 0 log_channel(cluster) log > > > > > [INF] : *2.2cb > > > > > starting backfill to osd.29 from* (0'0,0'0] MAX to 131306'8029954 > > > > > 2017-01-03 05:59:20.877841 7f3e670d0700 0 log_channel(cluster) log > > > > [INF] : > > > > > 2.30d starting backfill to osd.10 from (0'0,0'0] MAX to > > 131306'8721158 > > > > > 2017-01-03 05:59:31.374323 7f3e356b0700 0 -- 10.63.4.3:6826/3125306 > > >> > > > > > 10.63.4.5:6821/3162046 pipe(0x7f3e9d513000 sd=322 :6826 s=0 pgs=0 > > cs=0 > > > > l=0 > > > > > c=0x7f3ea72b5de0).accept connect_seq 1605 vs existing 1605 state > > standby > > > > > 2017-01-03 05:59:31.374440 7f3e356b0700 0 -- 10.63.4.3:6826/3125306 > > >> > > > > > 10.63.4.5:6821/3162046 pipe(0x7f3e9d513000 sd=322 :6826 s=0 pgs=0 > > cs=0 > > > > l=0 > > > > > c=0x7f3ea72b5de0).accept connect_seq 1606 vs existing 1605 state > > standby > > > > > ^C > > > > > root@ecprdbcph03-opens:/var/log/ceph# df -h > > > > > Filesystem Size Used Avail Use% Mounted on > > > > > udev 32G 4.0K 32G 1% /dev > > > > > tmpfs 6.3G 1.4M 6.3G 1% /run > > > > > /dev/dm-1 106G 4.1G 96G 5% / > > > > > none 4.0K 0 4.0K 0% > > /sys/fs/cgroup > > > > > none 5.0M 0 5.0M 0% /run/lock > > > > > none 32G 0 32G 0% /run/shm > > > > > none 100M 0 100M 0% /run/user > > > > > /dev/sdk2 465M 50M 391M 12% /boot > > > > > /dev/sdk1 512M 3.4M 509M 1% /boot/efi > > > > > ec-mapr-prd:/mapr/ec-mapr-prd/homes 262T 143T 119T 55% > > /export/home > > > > > /dev/sde1 889G 640G 250G 72% > > > > > /var/lib/ceph/osd/ceph-3 > > > > > /dev/sdf1 889G 656G 234G 74% > > > > > /var/lib/ceph/osd/ceph-4 > > > > > /dev/sdg1 889G 583G 307G 66% > > > > > /var/lib/ceph/osd/ceph-6 > > > > > /dev/sda1 889G 559G 331G 63% > > > > > /var/lib/ceph/osd/ceph-8 > > > > > /dev/sdb1 889G 651G 239G 74% > > > > > /var/lib/ceph/osd/ceph-10 > > > > > /dev/sdc1 889G 751G 139G 85% > > > > > /var/lib/ceph/osd/ceph-12 > > > > > /dev/sdh1 889G 759G 131G 86% > > > > > /var/lib/ceph/osd/ceph-14 > > > > > /dev/sdi1 889G 763G 127G 86% > > > > > /var/lib/ceph/osd/ceph-16 > > > > > /dev/sdj1 889G 732G 158G 83% > > > > > /var/lib/ceph/osd/ceph-18 > > > > > /dev/sdd1 889G 756G 134G 86% > > > > > /var/lib/ceph/osd/ceph-29 > > > > > root@ecprdbcph03-opens:/var/log/ceph# > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > *Yair Magnezi * > > > > > > > > > > > > > > > > > > > > *Storage & Data Protection TL // Kenshoo* > > > > > > > > > > > > > > > > > -- > > > > Christian Balzer Network/Systems Engineer > > > > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > > > > http://www.gol.com/ > > > > > > > > > > > > > -- > > Christian Balzer Network/Systems Engineer > > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > > http://www.gol.com/ > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com