Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx> · Wed, 11 Oct 2023 15:02:47 +0200

Hi Eugen,

sorry for posting twice, my zimbra server returns an error at the first 
attempt.

My initial problem is that ceph cannot detect these HDD since Pacific.
So I have deployed  Octopus, where "ceph orch apply osd 
--all-available-devices" works fine and then upgraded to Pacific.
But during the upgrate, 2 OSD went to "out" and "down" and I'm looking 
for a solution to manually re-integrate these 2 HDD in the cluster as 
Pacific is not able to do this automatically with "ceph orch..."  like 
Octopus.
But it is a test cluster to understand and get basic knowledge of Ceph  
(and I'm allowed to break everything).

Patrick

Le 11/10/2023 à 14:35, Eugen Block a écrit :
Don't use ceph-volume manually to deploy OSDs if your cluster is 
managed by cephadm. I just wanted to point out that you hadn't wiped 
the disk properly to be able to re-use it. Let the orchestrator handle 
the OSD creation and activation. I recommend to remove the OSD again, 
wipe it properly (cephadm ceph-volume lvm zap --destroy /dev/sdc) and 
then let the orchestrator add it as an OSD. Depending on your 
drivegroup configuration it will happen automatically (if 
"all-available-devices" is enabled or your osd specs are already 
applied). If it doesn't happen automatically, deploy it with 'ceph 
orch daemon add osd *<host>*:*<device-path>*' [1].

[1] https://docs.ceph.com/en/quincy/cephadm/services/osd/#deploy-osds

Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a 
strange final state. I was connected on host mostha1 where /dev/sdc 
was not reconized. These are the steps followed based on the 
ceph-volume documentation I've read:

   *[root@mostha1 ~]# cephadm shell**
   **[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >
   /var/lib/ceph/bootstrap-osd/ceph.keyring**
   **[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data
   /dev/sdc**
   *
   *[ceph: root@mostha1 /]# ceph-volume lvm list

   ====== osd.2 =======

   *  [block]
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632

          block device
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
          block uuid Pq0XeH-LJct-t4yH-f56F-d5jk-JzGQ-zITfhE
          cephx lockbox secret
          cluster fsid 250f9864-0142-11ee-8e5f-00266cf8869c
          cluster name              ceph
          crush device class
          encrypted                 0
   *osd fsid 45c8e92c-caf9-4fe7-9a42-7b45a0794632*
          osd id                    2
          osdspec affinity
          type                      block
          vdo                       0
   *      devices                   /dev/sdc

   *

Now lsblk command shows sdc as an osd:
....
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm
*sdc 8:32   1 232.9G  0 disk **
**`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 
253:5    0 232.8G  0 lvm **
*

But this osd.2 is "down" and "out" with a strange status (no related 
cluster host, no weight....) and I cannot activate it as within the 
podman container systemctl is not working.

   [ceph: root@mostha1 /]# ceph osd tree
   ID  CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT PRI-AFF
   -1         1.72823  root default
   -5         0.45477      host dean
     0    hdd  0.22739          osd.0         up   1.00000 1.00000
     4    hdd  0.22739          osd.4         up   1.00000 1.00000
   -9         0.22739      host ekman
     6    hdd  0.22739          osd.6         up   1.00000 1.00000
   -7         0.45479      host mostha1
     5    hdd  0.45479          osd.5         up   1.00000 1.00000
   -3         0.59128      host mostha2
     1    hdd  0.22739          osd.1         up   1.00000 1.00000
     3    hdd  0.36389          osd.3         up   1.00000 1.00000
   *2               0  osd.2               down         0 1.00000*

My attempt to activate the osd:

[ceph: root@mostha1 /]# ceph-volume lvm activate 2 
45c8e92c-caf9-4fe7-9a42-7b45a0794632
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph 
prime-osd-dir --dev 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
--path /var/lib/ceph/osd/ceph-2 --no-mon-config
Running command: /usr/bin/ln -snf 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
/var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -h ceph:ceph 
/var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/systemctl enable 
ceph-volume@lvm-2-45c8e92c-caf9-4fe7-9a42-7b45a0794632
 stderr: Created symlink 
/etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-45c8e92c-caf9-4fe7-9a42-7b45a0794632.service 
-> /usr/lib/systemd/system/ceph-volume@.service.
Running command: /usr/bin/systemctl enable --runtime ceph-osd@2
 stderr: Created symlink 
/run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service -> 
/usr/lib/systemd/system/ceph-osd@.service.
Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1

Patrick

Le 11/10/2023 à 11:00, Eugen Block a écrit :
Hi,

just wondering if 'ceph-volume lvm zap --destroy /dev/sdc' would 
help here. From your previous output you didn't specify the 
--destroy flag.
Which cephadm version is installed on the host? Did you also upgrade 
the OS when moving to Pacific? (Sorry if I missed that.

Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:

Le 02/10/2023 à 18:22, Patrick Bégou a écrit :
Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been 
automatically removed and upgrade finishes. Cluster Health is 
finaly OK, no data loss.

But now I cannot re-add these osd with pacific (I had previous 
troubles on these old HDDs, lost one osd in octopus and was able 
to reset and re-add it).

I've tried manually to add the first osd on the node where it is 
located, following 
https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/ 
(not sure it's the best idea...) but it fails too. This node was 
the one used for deploying the cluster.

[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole device will 
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M 
count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: <Raw Device: /dev/sdc>

[ceph: root@mostha1 /]# *ceph-volume lvm create --bluestore --data 
/dev/sdc*
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
9f1eb8ee-41e6-4350-ad73-1be21234ec7c
 stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1 auth: unable 
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) 
No such file or directory
 stderr: 2023-10-02T16:09:29.855+0000 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e405c4d8) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1 auth: unable 
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) 
No such file or directory
 stderr: 2023-10-02T16:09:29.856+0000 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e40601d0) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1 auth: unable 
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) 
No such file or directory
 stderr: 2023-10-02T16:09:29.857+0000 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4eb8bee90) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.858+0000 7fb4e965c700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods 
[2] but i only support [1]
 stderr: 2023-10-02T16:09:29.858+0000 7fb4e9e5d700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods 
[2] but i only support [1]
 stderr: 2023-10-02T16:09:29.858+0000 7fb4e8e5b700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods 
[2] but i only support [1]
 stderr: 2023-10-02T16:09:29.858+0000 7fb4eb8c0700 -1 monclient: 
authenticate NOTE: no keyring found; disabled cephx authentication
 stderr: [errno 13] RADOS permission denied (error connecting to 
the cluster)
-->  RuntimeError: Unable to create a new OSD id

Any idea of what is wrong ?

Thanks

Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

I'm still trying to understand what can be wrong or how to debug 
this situation where Ceph cannot see the devices.

The device :dev/sdc exists:

   [root@mostha1 ~]# cephadm shell lsmcli ldl
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e 

   Path     | SCSI VPD 0x83    | Link Type | Serial Number | Health
   Status
------------------------------------------------------------------------- 

   /dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142 | Good
   /dev/sdb | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good
   /dev/sdc | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494 | Good

But I cannot do anything with it:

   [root@mostha1 ~]# cephadm shell ceph orch device zap
   mostha1.legi.grenoble-inp.fr /dev/sdc --force
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e 

   Error EINVAL: Device path '/dev/sdc' not found on host
   'mostha1.legi.grenoble-inp.fr'

Since I moved from octopus to Pacific.

Patrick
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx