Re: Ceph orchestrator not refreshing device list

Bob Gibson <rjg@xxxxxxxxxx> · Wed, 16 Oct 2024 18:21:51 +0000

I’ve been away on vacation and just got back to this. I’m happy to report that manually recreating the OSD with ceph-volume and then adopting it with cephadm fixed the problem.

Thanks again for your help Eugen!

Cheers,
/rjg

> On Sep 29, 2024, at 10:40 AM, Eugen Block <eblock@xxxxxx> wrote:
> 
> EXTERNAL EMAIL | USE CAUTION
> 
> Okay, apparently this is not what I was facing. I see two other
> options right now. The first would be to purge osd.88 from the crush
> tree entirely.
> The second approach would be to create an osd manually with
> ceph-volume, not cephadm ceph-volume, to create a legacy osd (you'd
> get warnings about a stray daemon). If that works, adopt the osd with
> cephadm.
> I don't have a better idea right now.
> 
> Zitat von Bob Gibson <rjg@xxxxxxxxxx>:
> 
>> Here are the contents from the same directory on our osd node:
>> 
>> ceph-osd31.prod.os:/var/lib/ceph/9b3b3539-59a9-4338-8bab-3badfab6e855# ls -l
>> total 412
>> -rw-r--r--  1 root root 366903 Sep 14 14:53
>> cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b
>> drwx------  3  167  167   4096 Sep 14 15:01 crash
>> drwxr-xr-x 12 root root   4096 Sep 15 12:06 custom_config_files
>> drw-rw----  2 root root   4096 Sep 23 17:00 home
>> drwx------  2  167  167   4096 Sep 26 12:47 osd.84
>> drwx------  2  167  167   4096 Sep 26 12:47 osd.85
>> drwx------  2  167  167   4096 Sep 26 12:47 osd.86
>> drwx------  2  167  167   4096 Sep 26 12:47 osd.87
>> drwx------  2  167  167   4096 Sep 26 12:47 osd.89
>> drwx------  2  167  167   4096 Sep 26 12:47 osd.90
>> drwx------  2  167  167   4096 Sep 26 12:47 osd.91
>> drwx------  2  167  167   4096 Sep 26 12:47 osd.92
>> drwx------  2  167  167   4096 Sep 26 12:47 osd.93
>> drwx------  6 root root   4096 Sep 23 15:59 removed
>> 
>> In our case the osd.88 directory is under the subdirectory named
>> “removed”, the same as the other odds which have been converted.
>> 
>> ceph-osd31.prod.os:/var/lib/ceph/9b3b3539-59a9-4338-8bab-3badfab6e855# ls -l
>> removed/osd.88_2024-09-23T19\:59\:42.162302Z/
>> total 64
>> lrwxrwxrwx 1 167 167   93 Sep 15 12:10 block ->
>> /dev/ceph-2a13ec6a-a5f0-4773-8254-c38b915c824a/osd-block-7f8f9778-5ae2-47c1-bd03-a92a3a7a1db1
>> -rw------- 1 167 167   37 Sep 15 12:10 ceph_fsid
>> -rw------- 1 167 167  259 Sep 14 15:14 config
>> -rw------- 1 167 167   37 Sep 15 12:10 fsid
>> -rw------- 1 167 167   56 Sep 15 12:10 keyring
>> -rw------- 1 167 167    6 Sep 15 12:10 ready
>> -rw------- 1 167 167    3 Sep 14 11:11 require_osd_release
>> -rw------- 1 167 167   10 Sep 15 12:10 type
>> -rw------- 1 167 167   38 Sep 14 15:14 unit.configured
>> -rw------- 1 167 167   48 Sep 14 15:14 unit.created
>> -rw------- 1 167 167   26 Sep 14 15:06 unit.image
>> -rw------- 1 167 167   76 Sep 14 15:06 unit.meta
>> -rw------- 1 167 167 1527 Sep 14 15:06 unit.poststop
>> -rw------- 1 167 167 2586 Sep 14 15:06 unit.run
>> -rw------- 1 167 167  334 Sep 14 15:06 unit.stop
>> -rw------- 1 167 167    3 Sep 15 12:10 whoami
>> 
>> On Sep 27, 2024, at 9:30 AM, Eugen Block <eblock@xxxxxx> wrote:
>> 
>> EXTERNAL EMAIL | USE CAUTION
>> 
>> Oh interesting, I just got into the same situation (I believe) on a
>> test cluster:
>> 
>> host1:~ # ceph orch ps | grep unknown
>> osd.1                              host6
>> stopped          72s ago  36m        -    4096M  <unknown>  <unknown>
>>  <unknown>
>> osd.13                             host6
>> error            72s ago  36m        -    4096M  <unknown>  <unknown>
>>  <unknown>
>> 
>> I still had the remainders on the filesystem:
>> 
>> host6:~ # ll /var/lib/ceph/543967bc-e586-32b8-bd2c-2d8b8b168f02/osd.1
>> insgesamt 68
>> lrwxrwxrwx 1 ceph ceph  111 27. Sep 14:43 block ->
>> /dev/mapper/ceph--0e90997f--456e--4a9b--a8f9--a6f1038c1216-osd--block--81e7f32a--a728--4848--b14d--0b86bb7e1c69
>> lrwxrwxrwx 1 ceph ceph  108 27. Sep 14:43 block.db ->
>> /dev/mapper/ceph--9ea6e95f--ad43--4e40--8920--2e772b2efa2f-osd--db--f9c57ec1--77c8--4d9a--85df--1dc053a24000
>> 
>> I just removed those two directories to clear the warning, now my
>> orchestrator can deploy OSDs again on that node.
>> 
>> Hope that helps!
>> 
>> Zitat von Eugen Block <eblock@xxxxxx>:
>> 
>> Right, if you need encryption, a rebuild is required. Your procedure
>> has already worked 4 times, so I'd say nothing seems wrong with that
>> per se.
>> Regarding the stuck device list, do you see the mgr logging anything
>> suspicious? Especially when you say that it only returns output
>> after a failover. Those two osd specs are not conflicting since the
>> first is "unmanaged" after adoption.
>> Is there something in 'ceph orch osd rm status'? Can you run
>> 'cephadm ceph-volume inventory' locally on that node? Do you see any
>> hints in the node's syslog? Maybe try a reboot or something?
>> 
>> 
>> Zitat von Bob Gibson <rjg@xxxxxxxxxx>:
>> 
>> Thanks for your reply Eugen. I’m fairly new to cephadm so I wasn’t
>> aware that we could manage the drives without rebuilding them.
>> However, we thought we’d take advantage of this opportunity to also
>> encrypt the drives, and that does require a rebuild.
>> 
>> I have a theory on why the orchestrator is confused. I want to
>> create an osd service for each osd node so I can manage drives on a
>> per-node basis.
>> 
>> I started by creating a spec for the first node:
>> 
>> service_type: osd
>> service_id: ceph-osd31
>> placement:
>> hosts:
>> - ceph-osd31
>> spec:
>> data_devices:
>>  rotational: 0
>>  size: '3TB:'
>> encrypted: true
>> filter_logic: AND
>> objectstore: bluestore
>> 
>> But I also see a default spec, “osd”, which has placement set to
>> “unmanaged”.
>> 
>> `ceph orch ls osd —export` shows the following:
>> 
>> service_type: osd
>> service_name: osd
>> unmanaged: true
>> spec:
>> filter_logic: AND
>> objectstore: bluestore
>> ---
>> service_type: osd
>> service_id: ceph-osd31
>> service_name: osd.ceph-osd31
>> placement:
>> hosts:
>> - ceph-osd31
>> spec:
>> data_devices:
>>  rotational: 0
>>  size: '3TB:'
>> encrypted: true
>> filter_logic: AND
>> objectstore: bluestore
>> 
>> `ceph orch ls osd` shows that I was able to convert 4 drives using my spec:
>> 
>> NAME            PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
>> osd                         95  10m ago    -    <unmanaged>
>> osd.ceph-osd31               4  10m ago    43m  ceph-osd31
>> 
>> Despite being able to convert 4 drives, I’m wondering if these
>> specs are conflicting with one another, and that has confused the
>> orchestrator. If so, how do I safely get from where I am now to
>> where I want to be? :-)
>> 
>> Cheers,
>> /rjg
>> 
>> On Sep 26, 2024, at 3:31 PM, Eugen Block <eblock@xxxxxx> wrote:
>> 
>> EXTERNAL EMAIL | USE CAUTION
>> 
>> Hi,
>> 
>> this seems a bit unnecessary to rebuild OSDs just to get them managed.
>> If you apply a spec file that targets your hosts/OSDs, they will
>> appear as managed. So when you would need to replace a drive, you
>> could already utilize the orchestrator to remove and zap the drive.
>> That works just fine.
>> How to get out of your current situation is not entirely clear to me
>> yet. I’ll reread your post tomorrow.
>> 
>> Regards,
>> Eugen
>> 
>> Zitat von Bob Gibson <rjg@xxxxxxxxxx>:
>> 
>> Hi,
>> 
>> We recently converted a legacy cluster running Quincy v17.2.7 to
>> cephadm. The conversion went smoothly and left all osds unmanaged by
>> the orchestrator as expected. We’re now in the process of converting
>> the osds to be managed by the orchestrator. We successfully
>> converted a few of them, but then the orchestrator somehow got
>> confused. `ceph health detail` reports a “stray daemon” for the osd
>> we’re trying to convert, and the orchestrator is unable to refresh
>> its device list so it doesn’t see any available devices.
>> 
>> From the perspective of the osd node, the osd has been wiped and is
>> ready to be reinstalled. We’ve also rebooted the node for good
>> measure. `ceph osd tree` shows that the osd has been destroyed, but
>> the orchestrator won’t reinstall it because it thinks the device is
>> still active. The orchestrator device information is stale, but
>> we’re unable to refresh it. The usual recommended workaround of
>> failing over the mgr hasn’t helped. We’ve also tried `ceph orch
>> device ls —refresh` to no avail. In fact after running that command
>> subsequent runs of `ceph orch device ls` produce no output until the
>> mgr is failed over again.
>> 
>> Is there a way to force the orchestrator to refresh its list of
>> devices when in this state? If not, can anyone offer any suggestions
>> on how to fix this problem?
>> 
>> Cheers,
>> /rjg
>> 
>> P.S. Some additional information in case it’s helpful...
>> 
>> We’re using the following command to replace existing devices so
>> that they’re managed by the orchestrator:
>> 
>> ```
>> ceph orch osd rm <osd> --replace —zap
>> ```
>> 
>> and we’re currently stuck on osd 88.
>> 
>> ```
>> ceph health detail
>> HEALTH_WARN 1 stray daemon(s) not managed by cephadm
>> [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
>> stray daemon osd.88 on host ceph-osd31 not managed by cephadm
>> ```
>> 
>> `ceph osd tree` shows that the osd has been destroyed and is ready
>> to be replaced:
>> 
>> ```
>> ceph osd tree-from ceph-osd31
>> ID   CLASS  WEIGHT    TYPE NAME        STATUS     REWEIGHT  PRI-AFF
>> -46         34.93088  host ceph-osd31
>> 84    ssd   3.49309      osd.84              up   1.00000  1.00000
>> 85    ssd   3.49309      osd.85              up   1.00000  1.00000
>> 86    ssd   3.49309      osd.86              up   1.00000  1.00000
>> 87    ssd   3.49309      osd.87              up   1.00000  1.00000
>> 88    ssd   3.49309      osd.88       destroyed         0  1.00000
>> 89    ssd   3.49309      osd.89              up   1.00000  1.00000
>> 90    ssd   3.49309      osd.90              up   1.00000  1.00000
>> 91    ssd   3.49309      osd.91              up   1.00000  1.00000
>> 92    ssd   3.49309      osd.92              up   1.00000  1.00000
>> 93    ssd   3.49309      osd.93              up   1.00000  1.00000
>> ```
>> 
>> The cephadm log shows a claim on node `ceph-osd31` for that osd:
>> 
>> ```
>> 2024-09-25T14:15:45.699348-0400 mgr.ceph-mon3.qzjgws [INF] Found osd
>> claims -> {'ceph-osd31': ['88']}
>> 2024-09-25T14:15:45.699534-0400 mgr.ceph-mon3.qzjgws [INF] Found osd
>> claims for drivegroup ceph-osd31 -> {'ceph-osd31': ['88']}
>> ```
>> 
>> `ceph orch device ls` shows that the device list isn’t refreshing:
>> 
>> ```
>> ceph orch device ls ceph-osd31
>> HOST        PATH      TYPE  DEVICE ID
>> SIZE  AVAILABLE  REFRESHED  REJECT REASONS
>> ceph-osd31  /dev/sdc  ssd   INTEL_SSDSC2KG038T8_PHYG039603PE3P8EGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ceph-osd31  /dev/sdd  ssd   INTEL_SSDSC2KG038T8_PHYG039600AY3P8EGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ceph-osd31  /dev/sde  ssd   INTEL_SSDSC2KG038T8_PHYG039600CW3P8EGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ceph-osd31  /dev/sdf  ssd   INTEL_SSDSC2KG038T8_PHYG039600CM3P8EGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ceph-osd31  /dev/sdg  ssd   INTEL_SSDSC2KG038T8_PHYG039600UB3P8EGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ceph-osd31  /dev/sdh  ssd   INTEL_SSDSC2KG038T8_PHYG039603753P8EGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ceph-osd31  /dev/sdi  ssd   INTEL_SSDSC2KG038T8_PHYG039603R63P8EGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ceph-osd31  /dev/sdj  ssd   INTEL_SSDSC2KG038TZ_PHYJ4011032M3P8DGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ceph-osd31  /dev/sdk  ssd   INTEL_SSDSC2KG038TZ_PHYJ3234010J3P8DGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ceph-osd31  /dev/sdl  ssd   INTEL_SSDSC2KG038T8_PHYG039603NS3P8EGN
>> 3576G  No         22h ago    Insufficient space (<10 extents) on
>> vgs, LVM detected, locked
>> ```
>> 
>> `ceph node ls` thinks the osd still exists
>> 
>> ```
>> ceph node ls osd | jq -r '."ceph-osd31"'
>> [
>> 84,
>> 85,
>> 86,
>> 87,
>> 88, <— this shouldn’t exist
>> 89,
>> 90,
>> 91,
>> 92,
>> 93
>> ]
>> ```
>> 
>> Each osd node has 10x 3.8 TB ssd drives for osds. On `ceph-osd31`,
>> cephadm doesn’t see osd.88 as expected:
>> 
>> ```
>> cephadm ls --no-detail
>> [
>> {
>>     "style": "cephadm:v1",
>>     "name": "osd.93",
>>     "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
>>     "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.93"
>> },
>> {
>>     "style": "cephadm:v1",
>>     "name": "osd.85",
>>     "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
>>     "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.85"
>> },
>> {
>>     "style": "cephadm:v1",
>>     "name": "osd.90",
>>     "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
>>     "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.90"
>> },
>> {
>>     "style": "cephadm:v1",
>>     "name": "osd.92",
>>     "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
>>     "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.92"
>> },
>> {
>>     "style": "cephadm:v1",
>>     "name": "osd.89",
>>     "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
>>     "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.89"
>> },
>> {
>>     "style": "cephadm:v1",
>>     "name": "osd.87",
>>     "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
>>     "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.87"
>> },
>> {
>>     "style": "cephadm:v1",
>>     "name": "osd.86",
>>     "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
>>     "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.86"
>> },
>> {
>>     "style": "cephadm:v1",
>>     "name": "osd.84",
>>     "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
>>     "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.84"
>> },
>> {
>>     "style": "cephadm:v1",
>>     "name": "osd.91",
>>     "fsid": "9b3b3539-59a9-4338-8bab-3badfab6e855",
>>     "systemd_unit": "ceph-9b3b3539-59a9-4338-8bab-3badfab6e855@osd.91"
>> }
>> ]
>> ```
>> 
>> `lsblk` shows that `/dev/sdg` has been wiped.
>> 
>> ```
>> NAME
>>                              MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
>> sda
>>                                8:0    0 223.6G  0 disk
>> |-sda1
>>                                8:1    0    94M  0 part
>> `-sda2
>>                                8:2    0 223.5G  0 part
>> `-md0
>>                                9:0    0 223.4G  0 raid1 /
>> sdb
>>                                8:16   0 223.6G  0 disk
>> |-sdb1
>>                                8:17   0    94M  0 part
>> `-sdb2
>>                                8:18   0 223.5G  0 part
>> `-md0
>>                                9:0    0 223.4G  0 raid1 /
>> sdc
>>                                8:32   1   3.5T  0 disk
>> `-ceph--03782b4c--9faa--49f5--b554--98e7b8515834-osd--block--ba272724--daa6--45f5--9f69--789cc0bda077 253:3    0
>> 3.5T
>> lvm
>> `-keCkP2-o6h8-jKkw-RKiD-UBFf-A8EL-JDJGPR
>>                              253:9    0   3.5T  0 crypt
>> sdd
>>                                8:48   1   3.5T  0 disk
>> `-ceph--c07907d8--4a75--4ba3--b5e1--2ebf49ecbdf6-osd--block--58d1d50d--6228--4e6f--9a52--2a305ba00700 253:7    0
>> 3.5T
>> lvm
>> `-WB8Mxn-qCHI-4T01-imiG-hNBR-by60-YuxgfD
>>                              253:11   0   3.5T  0 crypt
>> sde
>>                                8:64   1   3.5T  0 disk
>> `-ceph--6f9d4df4--7ce6--44a4--a7b1--62c85af8cfe0-osd--block--aabcb30d--0084--490a--969b--78f7af6e94da 253:8    0
>> 3.5T
>> lvm
>> `-g9qErH-vTXY-JQbs-eh61-W0Mn-TAV8-gof4zy
>>                              253:12   0   3.5T  0 crypt
>> sdf
>>                                8:80   1   3.5T  0 disk
>> `-ceph--d6b728f8--e365--46db--b30f--6c00805c752b-osd--block--88426db7--2322--4807--ac2e--b49929e170d6 253:6    0
>> 3.5T
>> lvm
>> `-LNG2gB-pa0w-gl2v-hVQ3-6qTd-aXsR-Lenri3
>>                              253:10   0   3.5T  0 crypt
>> sdg
>>                                8:96   1   3.5T  0 disk
>> sdh
>>                                8:112  1   3.5T  0 disk
>> `-ceph--de2cfee6--8e0a--4aa0--9e6b--90c09025768c-osd--block--a3b86251--2799--4243--a857--f218fa90f29a 253:2    0
>> 3.5T
>> lvm
>> sdi
>>                                8:128  1   3.5T  0 disk
>> `-ceph--30dee450--0fdd--46ea--9eec--6a4c7706df9c-osd--block--bfc090db--dde4--47dd--a1c9--1cd838ea43b3 253:4    0
>> 3.5T
>> lvm
>> sdj
>>                                8:144  1   3.5T  0 disk
>> `-ceph--78febcf5--43f4--4820--8dc7--0f6c22816c9f-osd--block--da1e69c7--6427--4562--8290--90bcb9526747 253:0    0
>> 3.5T
>> lvm
>> sdk
>>                                8:160  1   3.5T  0 disk
>> `-ceph--fe210281--b1f5--4d5e--9ab0--2f226612af00-osd--block--6bb9f308--e853--4303--83ea--553c3a3513e1 253:1    0
>> 3.5T
>> lvm
>> sdl
>>                                8:176  1   3.5T  0 disk
>> `-ceph--9f21c916--f211--4d1b--8214--6ad1cecac810-osd--block--572d850c--c201--4af4--ac42--0ed2a6ed73ed 253:5    0
>> 3.5T
>> lvm
>> ```
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx