Re: cephadm: How to replace failed HDD where DB is on SSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27.05.2021 11:53, Eugen Block wrote:
This test was on ceph version 15.2.8.

On Pacific (ceph version 16.2.4) this also works for me for initial
deployment of an entire host:

+---------+-------------+----------+----------+----------+-----+
|SERVICE  |NAME         |HOST      |DATA      |DB        |WAL  |
+---------+-------------+----------+----------+----------+-----+
|osd      |ssd-hdd-mix  |pacific1  |/dev/vdb  |/dev/vdd  |-    |
|osd      |ssd-hdd-mix  |pacific1  |/dev/vdc  |/dev/vdd  |-    |
+---------+-------------+----------+----------+----------+-----+

But it doesn't work if I remove one OSD, just like you describe. This
is what ceph-volume reports:

---snip---
[ceph: root@pacific1 /]# ceph-volume lvm batch --report /dev/vdc
--db-devices /dev/vdd --block-db-size 3G
--> passed data devices: 1 physical, 0 LVM
--> relative data size: 1.0
--> passed block_db devices: 1 physical, 0 LVM
--> 1 fast devices were passed, but none are available

Total OSDs: 0

  Type            Path
    LV Size         % of device
---snip---

I know that this has already worked in Octopus, I did test it
successfully not long ago.

Thank you for trying, so it looks like a bug.
Searching through the issue tracker I find few issues related to replacing OSD, but it doesn't look like they get much attention.


I tried to find a way to add the disk manually, did not find any documentation about it, but looking at the source code, some issues with some trial and error I ended up with this.

Since the LV is deleted I recreated it with the same name.

# lvcreate -l 91570 -n osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69 ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b

In "cephadm shell"
# cephadm shell
# ceph auth get client.bootstrap-osd >/var/lib/ceph/bootstrap-osd/ceph.keyring # ceph-volume lvm prepare --bluestore --no-systemd --data /dev/sdt --block.db ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b/osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69


Need to have a json file for the "cephadm deploy"
# printf '{\n"config": "%s",\n"keyring": "%s"\n}\n' "$(ceph config generate-minimal-conf | sed -e ':a;N;$!ba;s/\n/\\n/g' -e 's/\t/\\t/g' -e 's/$/\\n/')" "$(ceph auth get osd.178 | head -n 2 | sed -e ':a;N;$!ba;s/\n/\\n/g' -e 's/\t/\\t/g' -e 's/$/\\n/')" >config-osd.178.json


Exit cephadm shell and run
# cephadm --image ceph:v15.2.9 deploy --fsid 3614abcc-201c-11eb-995a-2794bcc75ae0 --config-json /var/lib/ceph/3614abcc-201c-11eb-995a-2794bcc75ae0/home/config-osd.178.json --osd-fsid 9227e8ae-92eb-429e-9c7f-d4a2b75afb8e


And the OSD is back, but the VG name on the HDD is missing block in it's name, just a cosmetic thing so I leave it as is.

LV VG Attr LSize osd-block-9227e8ae-92eb-429e-9c7f-d4a2b75afb8e ceph-46f42262-d3dc-4dc3-8952-eec3e4a2c178 -wi-ao---- 12.47t osd-block-2da790bc-a74c-41da-8772-3b8aac77001c ceph-block-1b5ad7e7-2e24-4315-8a05-7439ab782b45 -wi-ao---- 12.47t

The fist one is the new OSD and the second one is one that cephadm itself created.


--
Kai Stian Olstad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux