On Fri, Apr 05, 2019 at 02:42:53PM +0300, Igor Fedotov wrote: > wrt Round 1 - an ability to expand block(main) device has been added to > Nautilus, > > see: https://github.com/ceph/ceph/pull/25308 Oh, that's good. But still separate wal&db may be good for studying load on each volume (blktrace) or moving db/wal to another physical disk by means of LVM transparently to ceph. > wrt Round 2: > > - not setting 'size' label looks like a bug although I recall I fixed it... > Will double check. > > - wrong stats output is probably related to the lack of monitor restart - > could you please try that and report back if it helps? Or even restart the > whole cluster.. (well I understand that's a bad approach for production but > just to verify my hypothesis) Mon restart didn't help: node0:~# systemctl restart ceph-mon@0 node1:~# systemctl restart ceph-mon@1 node2:~# systemctl restart ceph-mon@2 node2:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 0.22739 1.00000 233GiB 9.44GiB 223GiB 4.06 0.12 128 1 hdd 0.22739 1.00000 233GiB 9.44GiB 223GiB 4.06 0.12 128 3 hdd 0.22739 0 0B 0B 0B 0 0 0 2 hdd 0.22739 1.00000 800GiB 409GiB 391GiB 51.18 1.51 128 TOTAL 1.24TiB 428GiB 837GiB 33.84 MIN/MAX VAR: 0.12/1.51 STDDEV: 26.30 Restarting mgrs and then all ceph daemons on all three nodes didn't help either: node2:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 0.22739 1.00000 233GiB 9.43GiB 223GiB 4.05 0.12 128 1 hdd 0.22739 1.00000 233GiB 9.43GiB 223GiB 4.05 0.12 128 3 hdd 0.22739 0 0B 0B 0B 0 0 0 2 hdd 0.22739 1.00000 800GiB 409GiB 391GiB 51.18 1.51 128 TOTAL 1.24TiB 428GiB 837GiB 33.84 MIN/MAX VAR: 0.12/1.51 STDDEV: 26.30 Maybe we should upgrade to v14.2.0 Nautilus instead of studying old bugs... after all, this is a toy cluster for now. Thank you for responding, -- Yury > On 4/5/2019 2:06 PM, Yury Shevchuk wrote: > > Hello all! > > > > We have a toy 3-node Ceph cluster running Luminous 12.2.11 with one > > bluestore osd per node. We started with pretty small OSDs and would > > like to be able to expand OSDs whenever needed. We had two issues > > with the expansion: one turned out user-serviceable while the other > > probably needs developers' look. I will describe both shortly. > > > > Round 1 > > ~~~~~~~ > > Trying to expand osd.2 by 1TB: > > > > # lvextend -L+1T /dev/vg0/osd2 > > Size of logical volume vg0/osd2 changed from 232.88 GiB (59618 extents) to 1.23 TiB (321762 extents). > > Logical volume vg0/osd2 successfully resized. > > > > # ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2 > > inferring bluefs devices from bluestore path > > slot 1 /var/lib/ceph/osd/ceph-2//block > > 1 : size 0x13a38800000 : own 0x[1bf2200000~254300000] > > Expanding... > > 1 : can't be expanded. Bypassing... > > # > > > > It didn't work. The explaination can be found in > > ceph/src/os/bluestore/BlueFS.cc at line 310: > > > > // returns true if specified device is under full bluefs control > > // and hence can be expanded > > bool BlueFS::is_device_expandable(unsigned id) > > { > > if (id >= MAX_BDEV || bdev[id] == nullptr) { > > return false; > > } > > switch(id) { > > case BDEV_WAL: > > return true; > > > > case BDEV_DB: > > // true if DB volume is non-shared > > return bdev[BDEV_SLOW] != nullptr; > > } > > return false; > > } > > > > So we have to use separate block.db and block.wal for OSD to be > > expandable. Indeed, our OSDs were created without separate block.db > > and block.wal, like this: > > > > ceph-volume lvm create --bluestore --data /dev/vg0/osd2 > > > > Recreating osd.2 with separate block.db and block.wal: > > > > # ceph-volume lvm zap --destroy --osd-id 2 > > # lvcreate -L1G -n osd2wal vg0 > > Logical volume "osd2wal" created. > > # lvcreate -L40G -n osd2db vg0 > > Logical volume "osd2db" created. > > # lvcreate -L400G -n osd2 vg0 > > Logical volume "osd2" created. > > # ceph-volume lvm create --osd-id 2 --bluestore --data vg0/osd2 --block.db vg0/osd2db --block.wal vg0/osd2wal > > > > Resync takes some time, and then we have expandable osd.2. > > > > > > Round 2 > > ~~~~~~~ > > Trying to expand osd.2 from 400G to 700G: > > > > # lvextend -L+300G /dev/vg0/osd2 > > Size of logical volume vg0/osd2 changed from 400.00 GiB (102400 extents) to 700.00 GiB (179200 extents). > > Logical volume vg0/osd2 successfully resized. > > > > # ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2/ > > inferring bluefs devices from bluestore path > > slot 0 /var/lib/ceph/osd/ceph-2//block.wal > > slot 1 /var/lib/ceph/osd/ceph-2//block.db > > slot 2 /var/lib/ceph/osd/ceph-2//block > > 0 : size 0x40000000 : own 0x[1000~3ffff000] > > 1 : size 0xa00000000 : own 0x[2000~9ffffe000] > > 2 : size 0xaf00000000 : own 0x[3000000000~400000000] > > Expanding... > > # > > > > > > This time expansion appears to work: 0xaf00000000 = 700GiB. > > > > However, the size in the block device label have not changed: > > > > # ceph-bluestore-tool show-label --dev /dev/vg0/osd2 > > { > > "/dev/vg0/osd2": { > > "osd_uuid": "a18ff7f7-0de1-4791-ba4b-f3b6d2221f44", > > "size": 429496729600, > > > > 429496729600 = 400GiB > > > > Worse, ceph osd df shows the added space as used, not available: > > > > # ceph osd df > > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > > 0 hdd 0.22739 1.00000 233GiB 8.06GiB 225GiB 3.46 0.13 128 > > 1 hdd 0.22739 1.00000 233GiB 8.06GiB 225GiB 3.46 0.13 128 > > 2 hdd 0.22739 1.00000 700GiB 301GiB 399GiB 43.02 1.58 64 > > TOTAL 1.14TiB 317GiB 849GiB 27.21 > > MIN/MAX VAR: 0.13/1.58 STDDEV: 21.43 > > > > If I expand osd.2 by another 100G, the space also goes to > > "USE" column: > > > > node2:~# ceph osd df > > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > > 0 hdd 0.22739 1.00000 233GiB 8.05GiB 225GiB 3.46 0.10 128 > > 1 hdd 0.22739 1.00000 233GiB 8.05GiB 225GiB 3.46 0.10 128 > > 3 hdd 0.22739 0 0B 0B 0B 0 0 0 > > 2 hdd 0.22739 1.00000 800GiB 408GiB 392GiB 51.01 1.52 128 > > TOTAL 1.24TiB 424GiB 842GiB 33.51 > > MIN/MAX VAR: 0.10/1.52 STDDEV: 26.54 > > > > So OSD expansion almost works, but not quite. If you had better luck > > with bluefs-bdev-expand, could you please share your story? > > > > > > -- Yury > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com