Hello all! We have a toy 3-node Ceph cluster running Luminous 12.2.11 with one bluestore osd per node. We started with pretty small OSDs and would like to be able to expand OSDs whenever needed. We had two issues with the expansion: one turned out user-serviceable while the other probably needs developers' look. I will describe both shortly. Round 1 ~~~~~~~ Trying to expand osd.2 by 1TB: # lvextend -L+1T /dev/vg0/osd2 Size of logical volume vg0/osd2 changed from 232.88 GiB (59618 extents) to 1.23 TiB (321762 extents). Logical volume vg0/osd2 successfully resized. # ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2 inferring bluefs devices from bluestore path slot 1 /var/lib/ceph/osd/ceph-2//block 1 : size 0x13a38800000 : own 0x[1bf2200000~254300000] Expanding... 1 : can't be expanded. Bypassing... # It didn't work. The explaination can be found in ceph/src/os/bluestore/BlueFS.cc at line 310: // returns true if specified device is under full bluefs control // and hence can be expanded bool BlueFS::is_device_expandable(unsigned id) { if (id >= MAX_BDEV || bdev[id] == nullptr) { return false; } switch(id) { case BDEV_WAL: return true; case BDEV_DB: // true if DB volume is non-shared return bdev[BDEV_SLOW] != nullptr; } return false; } So we have to use separate block.db and block.wal for OSD to be expandable. Indeed, our OSDs were created without separate block.db and block.wal, like this: ceph-volume lvm create --bluestore --data /dev/vg0/osd2 Recreating osd.2 with separate block.db and block.wal: # ceph-volume lvm zap --destroy --osd-id 2 # lvcreate -L1G -n osd2wal vg0 Logical volume "osd2wal" created. # lvcreate -L40G -n osd2db vg0 Logical volume "osd2db" created. # lvcreate -L400G -n osd2 vg0 Logical volume "osd2" created. # ceph-volume lvm create --osd-id 2 --bluestore --data vg0/osd2 --block.db vg0/osd2db --block.wal vg0/osd2wal Resync takes some time, and then we have expandable osd.2. Round 2 ~~~~~~~ Trying to expand osd.2 from 400G to 700G: # lvextend -L+300G /dev/vg0/osd2 Size of logical volume vg0/osd2 changed from 400.00 GiB (102400 extents) to 700.00 GiB (179200 extents). Logical volume vg0/osd2 successfully resized. # ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2/ inferring bluefs devices from bluestore path slot 0 /var/lib/ceph/osd/ceph-2//block.wal slot 1 /var/lib/ceph/osd/ceph-2//block.db slot 2 /var/lib/ceph/osd/ceph-2//block 0 : size 0x40000000 : own 0x[1000~3ffff000] 1 : size 0xa00000000 : own 0x[2000~9ffffe000] 2 : size 0xaf00000000 : own 0x[3000000000~400000000] Expanding... # This time expansion appears to work: 0xaf00000000 = 700GiB. However, the size in the block device label have not changed: # ceph-bluestore-tool show-label --dev /dev/vg0/osd2 { "/dev/vg0/osd2": { "osd_uuid": "a18ff7f7-0de1-4791-ba4b-f3b6d2221f44", "size": 429496729600, 429496729600 = 400GiB Worse, ceph osd df shows the added space as used, not available: # ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 0.22739 1.00000 233GiB 8.06GiB 225GiB 3.46 0.13 128 1 hdd 0.22739 1.00000 233GiB 8.06GiB 225GiB 3.46 0.13 128 2 hdd 0.22739 1.00000 700GiB 301GiB 399GiB 43.02 1.58 64 TOTAL 1.14TiB 317GiB 849GiB 27.21 MIN/MAX VAR: 0.13/1.58 STDDEV: 21.43 If I expand osd.2 by another 100G, the space also goes to "USE" column: pier42:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 0.22739 1.00000 233GiB 8.05GiB 225GiB 3.46 0.10 128 1 hdd 0.22739 1.00000 233GiB 8.05GiB 225GiB 3.46 0.10 128 3 hdd 0.22739 0 0B 0B 0B 0 0 0 2 hdd 0.22739 1.00000 800GiB 408GiB 392GiB 51.01 1.52 128 TOTAL 1.24TiB 424GiB 842GiB 33.51 MIN/MAX VAR: 0.10/1.52 STDDEV: 26.54 So OSD expansion almost works, but not quite. If you had better luck with bluefs-bdev-expand, could you please share your story? -- Yury _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com