Re: Ceph-ansible: add a new HDD to an already provisioned WAL device

Len Kimms <len.kimms@xxxxxxxxxxxxxxx> · Wed, 18 Jan 2023 16:03:24 +0100 (CET)

Hi Guillaume,

thank you very much for the quick clarification and elaborate workaround.

We’ll check if manual migration is feasible with our setup with respect to the time needed. Alternatively, we’re looking into completely redeploying all affected OSDs (i.e. shrinking the cluster with ceph-ansible and newly provisioning all the devices).
Thanks as well for giving us the hint with the flags. In both cases it makes sense to prevent unnecessary data migration (by setting noout, norecovery, etc.) during the procedure.

Cheers, Len

Guillaume Abrioux schrieb am 2023-01-18:
> Hi Len,

> Indeed, this is not possible with ceph-ansible.
> One option would be to do it manually with `ceph-volume lvm migrate`:

> (Note that it can be tedious given that it requires a lot of manual
> operations, especially for clusters with a large number of OSDs.)

> Initial setup:
> ```
> # cat group_vars/all
> ---
> devices:
>   - /dev/sdb
> dedicated_devices:
>   - /dev/sda
> ```

> ```
> [root@osd0 ~]# lsblk
> NAME
>                            MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda
>                             8:0    0  50G  0 disk
> `-ceph--8d085f45--939c--4a65--a577--d21fa146d7d6-osd--db--cd34400d--daf2--450f--97d9--d561e7a43d1a
>    252:1    0  50G  0 lvm
> sdb
>                             8:16   0  50G  0 disk
> `-ceph--4c77295c--28a5--440a--9561--b9dc4c814e36-osd--block--70fd3b96--7bb2--4ae3--a0f8--4d18748186f9
> 252:0    0  50G  0 lvm
> sdc
>                             8:32   0  50G  0 disk
> sdd
>                             8:48   0  50G  0 disk
> vda
>                           253:0    0  11G  0 disk
> `-vda1
>                            253:1    0  10G  0 part /
> ```

> ```
> [root@osd0 ~]# lvs
>   LV                                             VG
>                Attr       LSize   Pool Origin Data%  Meta%  Move Log
> Cpy%Sync Convert
>   osd-block-70fd3b96-7bb2-4ae3-a0f8-4d18748186f9
> ceph-4c77295c-28a5-440a-9561-b9dc4c814e36 -wi-ao---- <50.00g
>   osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
>  ceph-8d085f45-939c-4a65-a577-d21fa146d7d6 -wi-ao---- <50.00g
> [root@osd0 ~]# vgs
>   VG                                        #PV #LV #SN Attr   VSize   VFree
>   ceph-4c77295c-28a5-440a-9561-b9dc4c814e36   1   1   0 wz--n- <50.00g    0
>   ceph-8d085f45-939c-4a65-a577-d21fa146d7d6   1   1   0 wz--n- <50.00g    0
> ```

> Create a tmp LV on your new device:
> ```
> [root@osd0 ~]# pvcreate /dev/sdd
>   Physical volume "/dev/sdd" successfully created.
> [root@osd0 ~]# vgcreate vg_db_tmp /dev/sdd
>   Volume group "vg_db_tmp" successfully created
> [root@osd0 ~]# lvcreate -n db-sdb -l 100%FREE vg_db_tmp
>   Logical volume "db-sdb" created.
> ```

> stop your osd:
> ```
> [root@osd0 ~]# systemctl stop ceph-osd@0
> ```

> Migrate the db to the tmp lv:
> ```
> [root@osd0 ~]# ceph-volume lvm migrate --osd-id 0 --osd-fsid
> 70fd3b96-7bb2-4ae3-a0f8-4d18748186f9 --from db --target vg_db_tmp/db-sdb
> --> Migrate to new, Source: ['--devs-source',
> '/var/lib/ceph/osd/ceph-0/block.db'] Target: /dev/vg_db_tmp/db-sdb
> Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block.db
> Running command: /bin/chown -R ceph:ceph /dev/dm-2
> --> Migration successful.
> ```

> remove the old lv:
> ```
> [root@osd0 ~]# lvremove
> /dev/ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
> Do you really want to remove active logical volume
> ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a?
> [y/n]: y
>   Logical volume "osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a" successfully
> removed.
> ```

> recreate a smaller LV.
> in my simplified case, I want to go from 1 to 2 db device. it means that my
> old LV has to be resized down to 1/2:
> ```
> [root@osd0 ~]# lvcreate -n osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a -l
> 50%FREE ceph-8d085f45-939c-4a65-a577-d21fa146d7d6
>   Logical volume "osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a" created.
> ```

> Migrate the db to the new LV:
> ```
> [root@osd0 ~]# ceph-volume lvm migrate --osd-id 0 --osd-fsid
> 70fd3b96-7bb2-4ae3-a0f8-4d18748186f9 --from db --target
> ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
> --> Migrate to new, Source: ['--devs-source',
> '/var/lib/ceph/osd/ceph-0/block.db'] Target:
> /dev/ceph-8d085f45-939c-4a65-a577-d21fa146d7d6/osd-db-cd34400d-daf2-450f-97d9-d561e7a43d1a
> Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block.db
> Running command: /bin/chown -R ceph:ceph /dev/dm-1
> --> Migration successful.
> ```

> restart the osd:
> ```
> [root@osd0 ~]# systemctl start ceph-osd@0
> ```

> remove tmp lv/vg/pv:
> ```
> [root@osd0 ~]# lvremove /dev/vg_db_tmp/db-sdb
> Do you really want to remove active logical volume vg_db_tmp/db-sdb? [y/n]:
> y
> [root@osd0 ~]# vgremove vg_db_tmp
>   Volume group "vg_db_tmp" successfully removed
> [root@osd0 ~]# pvremove /dev/sdd
>   Labels on physical volume "/dev/sdd" successfully wiped.
> ```

> add the new osd (should be done by re-running the playbook):
> ```
> [root@osd0 ~]# ceph-volume lvm batch --bluestore --yes /dev/sdb /dev/sdc
> --db-devices /dev/sda
> --> passed data devices: 2 physical, 0 LVM
> --> relative data size: 1.0
> --> passed block_db devices: 1 physical, 0 LVM

> ... omitted output ...

> --> ceph-volume lvm activate successful for osd ID: 1
> --> ceph-volume lvm create successful for: /dev/sdc
> [root@osd0 ~]#
> ```

> new lsblk output:
> ```
> [root@osd0 ~]# lsblk
> NAME
>                            MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda
>                             8:0    0  50G  0 disk
> |-ceph--8d085f45--939c--4a65--a577--d21fa146d7d6-osd--db--cd34400d--daf2--450f--97d9--d561e7a43d1a
>    252:0    0  25G  0 lvm
> `-ceph--8d085f45--939c--4a65--a577--d21fa146d7d6-osd--db--bb30e5aa--a634--4c52--8b99--a222c03c18e3
>    252:3    0  25G  0 lvm
> sdb
>                             8:16   0  50G  0 disk
> `-ceph--4c77295c--28a5--440a--9561--b9dc4c814e36-osd--block--70fd3b96--7bb2--4ae3--a0f8--4d18748186f9
> 252:1    0  50G  0 lvm
> sdc
>                             8:32   0  50G  0 disk
> `-ceph--5255bfbb--f133--4954--aaa8--35e2643ed491-osd--block--9e67ea46--2409--45f8--83e1--f66a42a6d9d0
> 252:2    0  50G  0 lvm
> sdd
>                             8:48   0  50G  0 disk
> vda
>                           253:0    0  11G  0 disk
> `-vda1
>                            253:1    0  10G  0 part /
> ```

> If you plan to re-run the playbook, do not forget to update your group_vars
> to reflect the new topology:

> ```
> # cat group_vars/all
> ---
> devices:
>   - /dev/sdb
>   - /dev/sdc
> dedicated_devices:
>   - /dev/sda
> ```

> You might want to use some osd flags (noout, etc..) in order to avoid
> unnecessary data migration.

> Regards,

> On Tue, 17 Jan 2023 at 18:39, Len Kimms <len.kimms@xxxxxxxxxxxxxxx> wrote:

> > Hello all,
> >
> > we’ve set up a new Ceph cluster with a number of nodes which are all
> > identically configured.
> > There is one device vda which should act as WAL device for all other
> > devices. Additionally, there are four other devices vdb, vdc, vdd, vde
> > which use vda as WAL.
> > The whole cluster was set up using ceph-ansible (branch stable-7.0) and
> > Ceph version 17.2.0.
> > Device configuration in osds.yml looks as follows:
> >    devices: [/dev/vdb, /dev/vdc, /dev/vdd, /dev/vde]
> >    bluestore_wal_devices: [/dev/vda]
> > As expected vda contains four logical volumes for WAL each 1/4 of the
> > overall vda disk size (‘ceph-ansible/group_vars/all.yml’ has default
> > ‘block_db_size: -1’).
> >
> > After the initial setup, we’ve added an additional device vdf which should
> > become a new OSD. The new OSD should use vda for WAL as well. This means
> > the previous four WAL LVs have to be resized down to 1/5 and a new LV has
> > to be added.
> >
> > Is it possible to retroactively add a new device to an already provisioned
> > WAL device?
> >
> > We suspect that this is not possible because the ceph-bluestore-tool does
> > not provide any way to shrink an existing BlueFS device. Only expanding is
> > currently possible (
> > https://docs.ceph.com/en/quincy/man/8/ceph-bluestore-tool/).
> > Simply adding the new device to the devices list and rerunning the
> > playbook does nothing. And so does only setting “devices: [/dev/vdf]” and
> > “bluestore_wal_devices: [/dev/vda]”. In both cases vda is rejected because
> > “Insufficient space (<10 extents) on vgs” which makes sense because vda is
> > already fully used by the previous four OSD WALs.
> >
> > Thanks for the help and kind regards.
> >
> >
> > Additional notes:
> > - We’re testing pre-production on an emulated cluster hence the device
> > names vdx and unusually small device sizes.
> > - The output of `lsblk` after the initial setup looks as follows:
> > ```
> > vda
> >                            252:0    0    8G  0 disk
> > ├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--3677c354--8d7d--4db9--a2b7--68aeb8248d40
> >  253:2    0    2G  0 lvm
> > ├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--52d71122--b573--4077--9633--968c178612fd
> >  253:4    0    2G  0 lvm
> > ├─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--2d7eb467--cfb1--4a00--8a45--273932036599
> >  253:6    0    2G  0 lvm
> > └─ceph--36607c7f--e51c--452e--a44a--225d8d0b0aa8-osd--wal--d7b13b79--219c--4002--9e92--370dff7a5376
> >  253:8    0    2G  0 lvm
> > vdb
> >                            252:16   0    8G  0 disk
> > └─ceph--49ddaa8b--5d8f--4267--85f9--5cac608ce53d-osd--block--861a53c7--ee57--4c5f--9546--1dd7cb0185ef
> > 253:1    0    8G  0 lvm
> > vdc
> >                            252:32   0    5G  0 disk
> > └─ceph--1ed9ee91--e071--4ea4--9703--d56d84d9ae0a-osd--block--8aacb66a--e29b--4b7a--8ad5--a9fb1f81c6d6
> > 253:3    0    5G  0 lvm
> > vdd
> >                            252:48   0    5G  0 disk
> > └─ceph--554cdd8b--e722--41a9--8f64--c09c857cc0dc-osd--block--4dee3e1b--b50d--4154--b2ff--80cadb67e2a0
> > 253:5    0    5G  0 lvm
> > vde
> >                            252:64   0    5G  0 disk
> > └─ceph--5d58de32--ca55--4895--8ac7--af94ee07672e-osd--block--3f563f40--0c1e--4cca--9325--d9534cceb711
> > 253:7    0    5G  0 lvm
> > vdf
> >                            252:80   0    5G  0 disk
> > ```
> > - Ceph status is happy and healthy:
> > ```
> >   cluster:
> >     id:     ff043ce8-xxxx-xxxx-xxxx-e98d073c9d09
> >     health: HEALTH_WARN
> >             mons are allowing insecure global_id reclaim
> >
> >   services:
> >     mon: 3 daemons, quorum baloo-1,baloo-2,baloo-3 (age 13m)
> >     mgr: baloo-2(active, since 5m), standbys: baloo-3, baloo-1
> >     mds: 1/1 daemons up, 1 standby
> >     osd: 24 osds: 24 up (since 4m), 24 in (since 5m)
> >     rgw: 1 daemon active (1 hosts, 1 zones)
> >
> >   data:
> >     volumes: 1/1 healthy
> >     pools:   7 pools, 177 pgs
> >     objects: 213 objects, 584 KiB
> >     usage:   98 MiB used, 138 GiB / 138 GiB avail
> >     pgs:     177 active+clean
> > ```
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >

> --

> *Guillaume Abrioux*Senior Software Engineer
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx