Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Can you check which cephadm version is installed on the host? And then please add (only the relevant) output from the cephadm.log when you run the inventory (without the --image <octopus>). Sometimes the version mismatch on the host and the one the orchestrator uses can cause some disruptions. You could try the same with the latest cephadm you have in /var/lib/ceph/${fsid}/ (ls -lrt /var/lib/ceph/${fsid}/cephadm.*). I mentioned that in this thread [1]. So you could try the following:

$ chmod +x /var/lib/ceph/{fsid}/cephadm.{latest}

$ python3 /var/lib/ceph/{fsid}/cephadm.{latest} ceph-volume inventory

Does the output differ? Paste the relevant cephadm.log from that attempt as well.

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/LASBJCSPFGDYAWPVE2YLV2ZLF3HC5SLS/

Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Hi Eugen,

first many thanks for the time spent on this problem.

"ceph osd purge 2 --force --yes-i-really-mean-it" works and clean all the bas status.

*[root@mostha1 ~]# cephadm shell
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
*
*
*[ceph: root@mostha1 /]# ceph osd purge 2 --force --yes-i-really-mean-it *
purged osd.2
*
*
*[ceph: root@mostha1 /]# ceph osd tree*
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.72823  root default
-5         0.45477      host dean
 0    hdd  0.22739          osd.0         up   1.00000  1.00000
 4    hdd  0.22739          osd.4         up   1.00000  1.00000
-9         0.22739      host ekman
 6    hdd  0.22739          osd.6         up   1.00000  1.00000
-7         0.45479      host mostha1
 5    hdd  0.45479          osd.5         up   1.00000  1.00000
-3         0.59128      host mostha2
 1    hdd  0.22739          osd.1         up   1.00000  1.00000
 3    hdd  0.36389          osd.3         up   1.00000  1.00000
*
*
*[ceph: root@mostha1 /]# lsblk*
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1   3.9G  0 part [SWAP]
`-sda3 8:3    1   225G  0 part
|-al8vg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-al8vg-homevol 253:2    0   9.8G  0 lvm  /rootfs/home
|-al8vg-tmpvol 253:3    0   9.8G  0 lvm  /rootfs/tmp
`-al8vg-varvol 253:4    0  19.8G  0 lvm  /rootfs/var
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:1    0 465.8G  0 lvm
sdc 8:32   1 232.9G  0 disk

"cephadm ceph-volume inventory" returns nothing:

*[root@mostha1 ~]# cephadm ceph-volume inventory **
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

Device Path               Size         Device nodes    rotates available Model name

[root@mostha1 ~]#

But running the same command within cephadm 15.2.17 works:

*[root@mostha1 ~]# cephadm --image 93146564743f ceph-volume inventory*
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path               Size         rotates available Model name
/dev/sdc                  232.83 GB    True    True      SAMSUNG HE253GJ
/dev/sda                  232.83 GB    True    False     SAMSUNG HE253GJ
/dev/sdb                  465.76 GB    True    False     WDC WD5003ABYX-1

[root@mostha1 ~]#

*[root@mostha1 ~]# podman images -a**
*REPOSITORY                        TAG         IMAGE ID CREATED        SIZE
quay.io/ceph/ceph                 v16.2.14    f13d80acdbb5  2 weeks ago    1.21 GB quay.io/ceph/ceph                 v15.2.17    93146564743f  14 months ago  1.24 GB
....


Patrick

Le 11/10/2023 à 15:14, Eugen Block a écrit :
Your response is a bit confusing since it seems to be mixed up with the previous answer. So you still need to remove the OSD properly, so purge it from the crush tree:

ceph osd purge 2 --force --yes-i-really-mean-it (only in a test cluster!)

If everything is clean (OSD has been removed, disk has been zapped, lsblk shows no LVs for that disk) you can check the inventory:

cephadm ceph-volume inventory

Please also add the output of 'ceph orch ls osd --export'.

Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a strange final state. I was connected on host mostha1 where /dev/sdc was not reconized. These are the steps followed based on the ceph-volume documentation I've read:
[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring
[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data /dev/sdc

Now lsblk command shows sdc as an osd:
....
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:1    0 465.8G  0 lvm
sdc 8:32   1 232.9G  0 disk
`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 253:5    0 232.8G  0 lvm

Then I've tried to activate this osd but it fails as in podman I have not access to systemctl:

[ceph: root@mostha1 /]# ceph-volume lvm activate 2 45c8e92c-caf9-4fe7-9a42-7b45a0794632
.....
Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1
[ceph: root@mostha1 /]# ceph osd tree

And now I have now I have a strange status for this osd.2:

[ceph: root@mostha1 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         1.72823  root default
-5         0.45477      host dean
 0    hdd  0.22739          osd.0         up   1.00000  1.00000
 4    hdd  0.22739          osd.4         up   1.00000  1.00000
-9         0.22739      host ekman
 6    hdd  0.22739          osd.6         up   1.00000  1.00000
-7         0.45479      host mostha1
 5    hdd  0.45479          osd.5         up   1.00000  1.00000
-3         0.59128      host mostha2
 1    hdd  0.22739          osd.1         up   1.00000  1.00000
 3    hdd  0.36389          osd.3         up   1.00000  1.00000
 2               0  osd.2               down         0  1.00000

I've tried to destroy the osd as you suggest but even if the command returns no error I still have this osd even if "lsblk" do not show any more /dev/sdc as a ceph osd device.

*[ceph: root@mostha1 /]# ceph-volume lvm zap --destroy /dev/sdc**
*--> Zapping: /dev/sdc
--> Zapping lvm member /dev/sdc. lv_path is /dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
--> Unmounting /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-2
 stderr: umount: /var/lib/ceph/osd/ceph-2 unmounted
Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.575633 s, 18.2 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c  stderr: Removing ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 (253:1)  stderr: Releasing logical volume "osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632"  stderr: Archiving volume group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" metadata (seqno 5).  stdout: Logical volume "osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632" successfully removed.  stderr: Removing physical volume "/dev/sdc" from volume group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c"  stdout: Volume group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" successfully removed  stderr: Creating volume group backup "/etc/lvm/backup/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" (seqno 6). Running command: nsenter --mount=/rootfs/proc/1/ns/mnt --ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net --uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/sdc
 stdout: Labels on physical volume "/dev/sdc" successfully wiped.
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.590652 s, 17.8 MB/s
*--> Zapping successful for: <Raw Device: /dev/sdc>*
*
*
*[ceph: root@mostha1 /]# ceph osd tree**
*ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT PRI-AFF
-1         1.72823  root default
-5         0.45477      host dean
 0    hdd  0.22739          osd.0         up   1.00000  1.00000
 4    hdd  0.22739          osd.4         up   1.00000  1.00000
-9         0.22739      host ekman
 6    hdd  0.22739          osd.6         up   1.00000  1.00000
-7         0.45479      host mostha1
 5    hdd  0.45479          osd.5         up   1.00000  1.00000
-3         0.59128      host mostha2
 1    hdd  0.22739          osd.1         up   1.00000  1.00000
 3    hdd  0.36389          osd.3         up   1.00000  1.00000
 2               0  osd.2               down         0  1.00000
*
*
*[ceph: root@mostha1 /]# lsblk**
*NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1   3.9G  0 part [SWAP]
`-sda3 8:3    1   225G  0 part
|-al8vg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-al8vg-homevol 253:3    0   9.8G  0 lvm  /rootfs/home
|-al8vg-tmpvol 253:4    0   9.8G  0 lvm  /rootfs/tmp
`-al8vg-varvol 253:5    0  19.8G  0 lvm  /rootfs/var
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:2    0 465.8G  0 lvm
*sdc *

Patrick
Le 11/10/2023 à 11:00, Eugen Block a écrit :
Hi,

just wondering if 'ceph-volume lvm zap --destroy /dev/sdc' would help here. From your previous output you didn't specify the --destroy flag. Which cephadm version is installed on the host? Did you also upgrade the OS when moving to Pacific? (Sorry if I missed that.


Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Le 02/10/2023 à 18:22, Patrick Bégou a écrit :
Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been automatically removed and upgrade finishes. Cluster Health is finaly OK, no data loss.

But now I cannot re-add these osd with pacific (I had previous troubles on these old HDDs, lost one osd in octopus and was able to reset and re-add it).

I've tried manually to add the first osd on the node where it is located, following https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/ (not sure it's the best idea...) but it fails too. This node was the one used for deploying the cluster.

[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole device will remove the partition table Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: <Raw Device: /dev/sdc>


[ceph: root@mostha1 /]# *ceph-volume lvm create --bluestore --data /dev/sdc*
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 9f1eb8ee-41e6-4350-ad73-1be21234ec7c  stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory  stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1 AuthRegistry(0x7fb4e405c4d8) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx  stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory  stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1 AuthRegistry(0x7fb4e40601d0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx  stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory  stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1 AuthRegistry(0x7fb4eb8bee90) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx  stderr: 2023-10-02T16:09:29.858+0000 7fb4e965c700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]  stderr: 2023-10-02T16:09:29.858+0000 7fb4e9e5d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]  stderr: 2023-10-02T16:09:29.858+0000 7fb4e8e5b700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]  stderr: 2023-10-02T16:09:29.858+0000 7fb4eb8c0700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication  stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id

Any idea of what is wrong ?

Thanks

Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


I'm still trying to understand what can be wrong or how to debug this situation where Ceph cannot see the devices.

The device :dev/sdc exists:

   [root@mostha1 ~]# cephadm shell lsmcli ldl
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e    Path     | SCSI VPD 0x83    | Link Type | Serial Number | Health
   Status ceph osd purge 2 --force --yes-i-really-mean-it
-------------------------------------------------------------------------    /dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142 | Good
   /dev/sdb | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good
   /dev/sdc | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494 | Good

But I cannot do anything with it:

   [root@mostha1 ~]# cephadm shell ceph orch device zap
   mostha1.legi.grenoble-inp.fr /dev/sdc --force
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e    Error EINVAL: Device path '/dev/sdc' not found on host
   'mostha1.legi.grenoble-inp.fr'

Since I moved from octopus to Pacific.

Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux