Hi Eugen,
- the OS is Alma Linux 8 with latests updates.
- this morning I've worked with ceph-volume but it ends with a strange
final state. I was connected on host mostha1 where /dev/sdc was not
reconized. These are the steps followed based on the ceph-volume
documentation I've read:
[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >
/var/lib/ceph/bootstrap-osd/ceph.keyring
[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data /dev/sdc
Now lsblk command shows sdc as an osd:
....
sdb 8:16 1 465.8G 0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5
253:1 0 465.8G 0 lvm
sdc 8:32 1 232.9G 0 disk
`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632
253:5 0 232.8G 0 lvm
Then I've tried to activate this osd but it fails as in podman I have
not access to systemctl:
[ceph: root@mostha1 /]# ceph-volume lvm activate 2
45c8e92c-caf9-4fe7-9a42-7b45a0794632
.....
Running command: /usr/bin/systemctl start ceph-osd@2
stderr: Failed to connect to bus: No such file or directory
--> RuntimeError: command returned non-zero exit status: 1
[ceph: root@mostha1 /]# ceph osd tree
And now I have now I have a strange status for this osd.2:
[ceph: root@mostha1 /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.72823 root default
-5 0.45477 host dean
0 hdd 0.22739 osd.0 up 1.00000 1.00000
4 hdd 0.22739 osd.4 up 1.00000 1.00000
-9 0.22739 host ekman
6 hdd 0.22739 osd.6 up 1.00000 1.00000
-7 0.45479 host mostha1
5 hdd 0.45479 osd.5 up 1.00000 1.00000
-3 0.59128 host mostha2
1 hdd 0.22739 osd.1 up 1.00000 1.00000
3 hdd 0.36389 osd.3 up 1.00000 1.00000
2 0 osd.2 down 0 1.00000
I've tried to destroy the osd as you suggest but even if the command
returns no error I still have this osd even if "lsblk" do not show any
more /dev/sdc as a ceph osd device.
*[ceph: root@mostha1 /]# ceph-volume lvm zap --destroy /dev/sdc**
*--> Zapping: /dev/sdc
--> Zapping lvm member /dev/sdc. lv_path is
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
--> Unmounting /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-2
stderr: umount: /var/lib/ceph/osd/ceph-2 unmounted
Running command: /usr/bin/dd if=/dev/zero
of=/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
bs=1M count=10 conv=fsync
stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.575633 s, 18.2 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group
ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net
--uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f
ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c
stderr: Removing
ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632
(253:1)
stderr: Releasing logical volume
"osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632"
stderr: Archiving volume group
"ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" metadata (seqno 5).
stdout: Logical volume
"osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632" successfully removed.
stderr: Removing physical volume "/dev/sdc" from volume group
"ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c"
stdout: Volume group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c"
successfully removed
stderr: Creating volume group backup
"/etc/lvm/backup/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" (seqno 6).
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net
--uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/sdc
stdout: Labels on physical volume "/dev/sdc" successfully wiped.
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10
conv=fsync
stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.590652 s, 17.8 MB/s
*--> Zapping successful for: <Raw Device: /dev/sdc>*
*
*
*[ceph: root@mostha1 /]# ceph osd tree**
*ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.72823 root default
-5 0.45477 host dean
0 hdd 0.22739 osd.0 up 1.00000 1.00000
4 hdd 0.22739 osd.4 up 1.00000 1.00000
-9 0.22739 host ekman
6 hdd 0.22739 osd.6 up 1.00000 1.00000
-7 0.45479 host mostha1
5 hdd 0.45479 osd.5 up 1.00000 1.00000
-3 0.59128 host mostha2
1 hdd 0.22739 osd.1 up 1.00000 1.00000
3 hdd 0.36389 osd.3 up 1.00000 1.00000
2 0 osd.2 down 0 1.00000
*
*
*[ceph: root@mostha1 /]# lsblk**
*NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 232.9G 0 disk
|-sda1 8:1 1 3.9G 0 part /rootfs/boot
|-sda2 8:2 1 3.9G 0 part [SWAP]
`-sda3 8:3 1 225G 0 part
|-al8vg-rootvol 253:0 0 48.8G 0 lvm /rootfs
|-al8vg-homevol 253:3 0 9.8G 0 lvm /rootfs/home
|-al8vg-tmpvol 253:4 0 9.8G 0 lvm /rootfs/tmp
`-al8vg-varvol 253:5 0 19.8G 0 lvm /rootfs/var
sdb 8:16 1 465.8G 0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5
253:2 0 465.8G 0 lvm
*sdc *
Patrick
Le 11/10/2023 à 11:00, Eugen Block a écrit :
Hi,
just wondering if 'ceph-volume lvm zap --destroy /dev/sdc' would help
here. From your previous output you didn't specify the --destroy flag.
Which cephadm version is installed on the host? Did you also upgrade
the OS when moving to Pacific? (Sorry if I missed that.
Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:
Le 02/10/2023 à 18:22, Patrick Bégou a écrit :
Hi all,
still stuck with this problem.
I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been
automatically removed and upgrade finishes. Cluster Health is finaly
OK, no data loss.
But now I cannot re-add these osd with pacific (I had previous
troubles on these old HDDs, lost one osd in octopus and was able to
reset and re-add it).
I've tried manually to add the first osd on the node where it is
located, following
https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/
(not sure it's the best idea...) but it fails too. This node was the
one used for deploying the cluster.
[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole device will
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10
conv=fsync
stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: <Raw Device: /dev/sdc>
[ceph: root@mostha1 /]# *ceph-volume lvm create --bluestore --data
/dev/sdc*
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
9f1eb8ee-41e6-4350-ad73-1be21234ec7c
stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1 auth: unable
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2)
No such file or directory
stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1
AuthRegistry(0x7fb4e405c4d8) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1 auth: unable
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2)
No such file or directory
stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1
AuthRegistry(0x7fb4e40601d0) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1 auth: unable
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2)
No such file or directory
stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1
AuthRegistry(0x7fb4eb8bee90) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
stderr: 2023-10-02T16:09:29.858+0000 7fb4e965c700 -1
monclient(hunting): handle_auth_bad_method server allowed_methods
[2] but i only support [1]
stderr: 2023-10-02T16:09:29.858+0000 7fb4e9e5d700 -1
monclient(hunting): handle_auth_bad_method server allowed_methods
[2] but i only support [1]
stderr: 2023-10-02T16:09:29.858+0000 7fb4e8e5b700 -1
monclient(hunting): handle_auth_bad_method server allowed_methods
[2] but i only support [1]
stderr: 2023-10-02T16:09:29.858+0000 7fb4eb8c0700 -1 monclient:
authenticate NOTE: no keyring found; disabled cephx authentication
stderr: [errno 13] RADOS permission denied (error connecting to the
cluster)
--> RuntimeError: Unable to create a new OSD id
Any idea of what is wrong ?
Thanks
Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
I'm still trying to understand what can be wrong or how to debug this
situation where Ceph cannot see the devices.
The device :dev/sdc exists:
[root@mostha1 ~]# cephadm shell lsmcli ldl
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
Path | SCSI VPD 0x83 | Link Type | Serial Number | Health
Status
-------------------------------------------------------------------------
/dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142 | Good
/dev/sdb | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good
/dev/sdc | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494 | Good
But I cannot do anything with it:
[root@mostha1 ~]# cephadm shell ceph orch device zap
mostha1.legi.grenoble-inp.fr /dev/sdc --force
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
Error EINVAL: Device path '/dev/sdc' not found on host
'mostha1.legi.grenoble-inp.fr'
Since I moved from octopus to Pacific.
Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx