Re: cephadm auto disk preparation and OSD installation incomplete

Eugen Block <eblock@xxxxxx> · Thu, 21 Mar 2024 14:27:39 +0000

Hi,

before getting into that the first thing I would do is to fail the  
mgr. There have been too many issues where failing over the mgr  
resolved many of them.
If that doesn't help, the cephadm.log should show something useful  
(/var/log/ceph/cephadm.log on the OSD hosts, I'm still not too  
familiar with the whole 'ceph log last 200 debug cephadm' thing).
I remember reports in earlier versions of ceph-volume (probably  
pre-cephadm) where not all OSDs were created if the host had many  
disks to deploy. But I can't find those threads right now.
And it's strange that on the second cluster no OSD is created at all,  
but again, maybe fail the mgr first before looking deeper into it.

Regards,
Eugen

Zitat von "Kuhring, Mathias" <mathias.kuhring@xxxxxxxxxxxxxx>:

Dear ceph community,

We have trouble with new disks not being properly prepared resp.  
OSDs not being fully installed by cephadm.
We just added one new node each with ~40 HDDs each to two of our  
ceph clusters.
In one cluster all but 5 disks got installed automatically.
In the other none got installed.

We are on ceph version 17.2.7  
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) on both  
clusters.
(I haven't added new disks since the last upgrade if I recall correctly).

This is our OSD service definition:
```
0|0[root@ceph-3-10 ~]# ceph orch ls osd --export
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
  host_pattern: '*'
spec:
  data_devices:
    all: true
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: unmanaged
service_name: osd.unmanaged
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore
```

Usually, new disks are installed properly (as expected due to  
all-available-devices).
This time, I can see that LVs were created (via `lsblk`, `lvs`,  
`cephadm ceph-volume lvm list`).
And OSDs are entered to the crushmap.
However, they are not assigned to a host yet, nor do they have a  
type or weight, e.g.:
```
0|0[root@ceph-2-10 ~]# ceph osd tree | grep "0  osd"
518                  0  osd.518               down         0  1.00000
519                  0  osd.519               down         0  1.00000
520                  0  osd.520               down         0  1.00000
521                  0  osd.521               down         0  1.00000
522                  0  osd.522               down         0  1.00000
```

And there is also no OSD daemon created (no docker container).
So, OSD creation is somehow stuck halfway.

I thought of fully cleaning up the OSD/disks.
Hopping cephadm might pick them up properly next time.
Just zapping was not possible, e.g. `cephadm ceph-volume lvm zap  
--destroy /dev/sdab` results in these errors:
```
/usr/bin/docker: stderr  stderr: wipefs: error: /dev/sdab: probing  
initialization failed: Device or resource busy
/usr/bin/docker: stderr --> failed to wipefs device, will try again  
to workaround probable race condition
```

So, I cleaned up more manually with purging them from crush and  
"resetting" disk and LV with dd and dmsetup, resp.:
```
ceph osd purge 480 --force
dd if=/dev/zero of=/dev/sdab bs=1M count=1
dmsetup remove  
ceph--e10e0f08--8705--441a--8caa--4590de22a611-osd--block--d464211c--f513--4513--86c1--c7ad63e6c142
```

ceph-volume still reported the old volumes, but then zapping  
actually got rid of them (only cleaned out the left-over entries, I  
guess).

Now, cephadm was able to get one OSD up, when I did this cleanup for  
only one disk.
When I did it in bulk for the rest, they all got stuck again the same way.

Looking into ceph-volume logs (here for osd.522 as representative):
```
0|0[root@ceph-2-11  
/var/log/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f]# ll *20240316
-rw-r--r-- 1 ceph ceph   613789 Mar 14 17:10 ceph-osd.522.log-20240316
-rw-r--r-- 1 root root 42473553 Mar 16 03:13 ceph-volume.log-20240316
```

ceph-volume only reports keyring creation:
```
[2024-03-14 16:10:19,509][ceph_volume.util.prepare][INFO  ] Creating  
keyring file for osd.522
[2024-03-14 16:10:19,510][ceph_volume.process][INFO  ] Running  
command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-522/keyring  
--create-keyring --name osd.522 --add-key  
AQBfIfNlinc7EBAAHeFicrjmLEjRPGSjuFuLiQ==
```

In the OSD logs I found a couple of these, but don't know if they  
are related:
```
2024-03-14T16:10:54.706+0000 7fab26988540  2 rocksdb:  
[db/column_family.cc:546] Failed to register data paths of column  
family (id: 11, name: P)
```

Has anyone seen this behaviour before?
Or could tell me where I should look next to troubleshoot this (which logs)?
Any help is appreciated.

Best Wishes,
Mathias
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx