Re: Ceph-ansible: add a new HDD to an already provisioned WAL device

Guillaume Abrioux <gabrioux@xxxxxxxxxx> · Wed, 18 Jan 2023 11:59:47 +0100

Hi Len,

Indeed, this is not possible with ceph-ansible.
One option would be to do it manually with `ceph-volume lvm migrate`:

(Note that it can be tedious given that it requires a lot of manual
operations, especially for clusters with a large number of OSDs.)

Initial setup:
```
# cat group_vars/all
---
devices:
  - /dev/sdb
dedicated_devices:
  - /dev/sda
```

```
[root@osd0 ~]# lsblk
NAME
                           MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda
                            8:0    0  50G  0 disk
`-ceph--8d085f45--939c--4a65--a577--d21fa146d7d6-osd--db--cd34400d--daf2--450f--97d9--d561e7a43d1a
   252:1    0  50G  0 lvm
sdb
                            8:16   0  50G  0 disk
`-ceph--4c77295c--28a5--440a--9561--b9dc4c814e36-osd--block--70fd3b96--7bb2--4ae3--a0f8--4d18748186f9
252:0    0  50G  0 lvm
sdc
                            8:32   0  50G  0 disk
sdd
                            8:48   0  50G  0 disk
vda
                          253:0    0  11G  0 disk
`-vda1
                           253:1    0  10G  0 part /
```

```
[root@osd0 ~]# lvs
  LV                                             VG
               Attr       LSize   Pool Origin Data%  Meta%  Move Log
Cpy%Sync Convert
  osd-block-70fd3b96-7bb2-4ae3-a0f8-4d18748186f9
ceph-4c77295c-28a5-440a-9561-b9dc4c814e36 -wi-ao---- <50.00g
  osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
 ceph-8d085f45-939c-4a65-a577-d21fa146d7d6 -wi-ao---- <50.00g
[root@osd0 ~]# vgs
  VG                                        #PV #LV #SN Attr   VSize   VFree
  ceph-4c77295c-28a5-440a-9561-b9dc4c814e36   1   1   0 wz--n- <50.00g    0
  ceph-8d085f45-939c-4a65-a577-d21fa146d7d6   1   1   0 wz--n- <50.00g    0
```

Create a tmp LV on your new device:
```
[root@osd0 ~]# pvcreate /dev/sdd
  Physical volume "/dev/sdd" successfully created.
[root@osd0 ~]# vgcreate vg_db_tmp /dev/sdd
  Volume group "vg_db_tmp" successfully created
[root@osd0 ~]# lvcreate -n db-sdb -l 100%FREE vg_db_tmp
  Logical volume "db-sdb" created.
```

stop your osd:
```
[root@osd0 ~]# systemctl stop ceph-osd@0
```

Migrate the db to the tmp lv:
```
[root@osd0 ~]# ceph-volume lvm migrate --osd-id 0 --osd-fsid
70fd3b96-7bb2-4ae3-a0f8-4d18748186f9 --from db --target vg_db_tmp/db-sdb
--> Migrate to new, Source: ['--devs-source',
'/var/lib/ceph/osd/ceph-0/block.db'] Target: /dev/vg_db_tmp/db-sdb
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block.db
Running command: /bin/chown -R ceph:ceph /dev/dm-2
--> Migration successful.
```

remove the old lv:
```
[root@osd0 ~]# lvremove
/dev/ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
Do you really want to remove active logical volume
ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a?
[y/n]: y
  Logical volume "osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a" successfully
removed.
```

recreate a smaller LV.
in my simplified case, I want to go from 1 to 2 db device. it means that my
old LV has to be resized down to 1/2:
```
[root@osd0 ~]# lvcreate -n osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a -l
50%FREE ceph-8d085f45-939c-4a65-a577-d21fa146d7d6
  Logical volume "osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a" created.
```

Migrate the db to the new LV:
```
[root@osd0 ~]# ceph-volume lvm migrate --osd-id 0 --osd-fsid
70fd3b96-7bb2-4ae3-a0f8-4d18748186f9 --from db --target
ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
--> Migrate to new, Source: ['--devs-source',
'/var/lib/ceph/osd/ceph-0/block.db'] Target:
/dev/ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block.db
Running command: /bin/chown -R ceph:ceph /dev/dm-1
--> Migration successful.
```

restart the osd:
```
[root@osd0 ~]# systemctl start ceph-osd@0
```

remove tmp lv/vg/pv:
```
[root@osd0 ~]# lvremove /dev/vg_db_tmp/db-sdb
Do you really want to remove active logical volume vg_db_tmp/db-sdb? [y/n]:
y
[root@osd0 ~]# vgremove vg_db_tmp
  Volume group "vg_db_tmp" successfully removed
[root@osd0 ~]# pvremove /dev/sdd
  Labels on physical volume "/dev/sdd" successfully wiped.
```

add the new osd (should be done by re-running the playbook):
```
[root@osd0 ~]# ceph-volume lvm batch --bluestore --yes /dev/sdb /dev/sdc
--db-devices /dev/sda
--> passed data devices: 2 physical, 0 LVM
--> relative data size: 1.0
--> passed block_db devices: 1 physical, 0 LVM

... omitted output ...

--> ceph-volume lvm activate successful for osd ID: 1
--> ceph-volume lvm create successful for: /dev/sdc
[root@osd0 ~]#
```

new lsblk output:
```
[root@osd0 ~]# lsblk
NAME
                           MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda
                            8:0    0  50G  0 disk
|-ceph--8d085f45--939c--4a65--a577--d21fa146d7d6-osd--db--cd34400d--daf2--450f--97d9--d561e7a43d1a
   252:0    0  25G  0 lvm
`-ceph--8d085f45--939c--4a65--a577--d21fa146d7d6-osd--db--bb30e5aa--a634--4c52--8b99--a222c03c18e3
   252:3    0  25G  0 lvm
sdb
                            8:16   0  50G  0 disk
`-ceph--4c77295c--28a5--440a--9561--b9dc4c814e36-osd--block--70fd3b96--7bb2--4ae3--a0f8--4d18748186f9
252:1    0  50G  0 lvm
sdc
                            8:32   0  50G  0 disk
`-ceph--5255bfbb--f133--4954--aaa8--35e2643ed491-osd--block--9e67ea46--2409--45f8--83e1--f66a42a6d9d0
252:2    0  50G  0 lvm
sdd
                            8:48   0  50G  0 disk
vda
                          253:0    0  11G  0 disk
`-vda1
                           253:1    0  10G  0 part /
```

If you plan to re-run the playbook, do not forget to update your group_vars
to reflect the new topology:

```
# cat group_vars/all
---
devices:
  - /dev/sdb
  - /dev/sdc
dedicated_devices:
  - /dev/sda
```

You might want to use some osd flags (noout, etc..) in order to avoid
unnecessary data migration.

Regards,

On Tue, 17 Jan 2023 at 18:39, Len Kimms <len.kimms@xxxxxxxxxxxxxxx> wrote:

> Hello all,
>
> we’ve set up a new Ceph cluster with a number of nodes which are all
> identically configured.
> There is one device vda which should act as WAL device for all other
> devices. Additionally, there are four other devices vdb, vdc, vdd, vde
> which use vda as WAL.
> The whole cluster was set up using ceph-ansible (branch stable-7.0) and
> Ceph version 17.2.0.
> Device configuration in osds.yml looks as follows:
>    devices: [/dev/vdb, /dev/vdc, /dev/vdd, /dev/vde]
>    bluestore_wal_devices: [/dev/vda]
> As expected vda contains four logical volumes for WAL each 1/4 of the
> overall vda disk size (‘ceph-ansible/group_vars/all.yml’ has default
> ‘block_db_size: -1’).
>
> After the initial setup, we’ve added an additional device vdf which should
> become a new OSD. The new OSD should use vda for WAL as well. This means
> the previous four WAL LVs have to be resized down to 1/5 and a new LV has
> to be added.
>
> Is it possible to retroactively add a new device to an already provisioned
> WAL device?
>
> We suspect that this is not possible because the ceph-bluestore-tool does
> not provide any way to shrink an existing BlueFS device. Only expanding is
> currently possible (
> https://docs.ceph.com/en/quincy/man/8/ceph-bluestore-tool/).
> Simply adding the new device to the devices list and rerunning the
> playbook does nothing. And so does only setting “devices: [/dev/vdf]” and
> “bluestore_wal_devices: [/dev/vda]”. In both cases vda is rejected because
> “Insufficient space (<10 extents) on vgs” which makes sense because vda is
> already fully used by the previous four OSD WALs.
>
> Thanks for the help and kind regards.
>
>
> Additional notes:
> - We’re testing pre-production on an emulated cluster hence the device
> names vdx and unusually small device sizes.
> - The output of `lsblk` after the initial setup looks as follows:
> ```
> vda
>                            252:0    0    8G  0 disk
> ├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--3677c354--8d7d--4db9--a2b7--68aeb8248d40
>  253:2    0    2G  0 lvm
> ├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--52d71122--b573--4077--9633--968c178612fd
>  253:4    0    2G  0 lvm
> ├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--2d7eb467--cfb1--4a00--8a45--273932036599
>  253:6    0    2G  0 lvm
> └─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--d7b13b79--219c--4002--9e92--370dff7a5376
>  253:8    0    2G  0 lvm
> vdb
>                            252:16   0    8G  0 disk
> └─ceph--49ddaa8b--5d8f--4267--85f9--5cac608ce53d-osd--block--861a53c7--ee57--4c5f--9546--1dd7cb0185ef
> 253:1    0    8G  0 lvm
> vdc
>                            252:32   0    5G  0 disk
> └─ceph--1ed9ee91--e071--4ea4--9703--d56d84d9ae0a-osd--block--8aacb66a--e29b--4b7a--8ad5--a9fb1f81c6d6
> 253:3    0    5G  0 lvm
> vdd
>                            252:48   0    5G  0 disk
> └─ceph--554cdd8b--e722--41a9--8f64--c09c857cc0dc-osd--block--4dee3e1b--b50d--4154--b2ff--80cadb67e2a0
> 253:5    0    5G  0 lvm
> vde
>                            252:64   0    5G  0 disk
> └─ceph--5d58de32--ca55--4895--8ac7--af94ee07672e-osd--block--3f563f40--0c1e--4cca--9325--d9534cceb711
> 253:7    0    5G  0 lvm
> vdf
>                            252:80   0    5G  0 disk
> ```
> - Ceph status is happy and healthy:
> ```
>   cluster:
>     id:     ff043ce8-xxxx-xxxx-xxxx-e98d073c9d09
>     health: HEALTH_WARN
>             mons are allowing insecure global_id reclaim
>
>   services:
>     mon: 3 daemons, quorum baloo-1,baloo-2,baloo-3 (age 13m)
>     mgr: baloo-2(active, since 5m), standbys: baloo-3, baloo-1
>     mds: 1/1 daemons up, 1 standby
>     osd: 24 osds: 24 up (since 4m), 24 in (since 5m)
>     rgw: 1 daemon active (1 hosts, 1 zones)
>
>   data:
>     volumes: 1/1 healthy
>     pools:   7 pools, 177 pgs
>     objects: 213 objects, 584 KiB
>     usage:   98 MiB used, 138 GiB / 138 GiB avail
>     pgs:     177 active+clean
> ```
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 

*Guillaume Abrioux*Senior Software Engineer
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx