Here are the contents from the same directory on our osd node: ceph-osd31.prod.os:/var/lib/ceph/9b3b3539-59a9-4338-8bab-3badfab6e855# ls -l total 412 -rw-r--r-- 1 root root 366903 Sep 14 14:53 cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b drwx------ 3 167 167 4096 Sep 14 15:01 crash drwxr-xr-x 12 root root 4096 Sep 15 12:06 custom_config_files drw-rw---- 2 root root 4096 Sep 23 17:00 home drwx------ 2 167 167 4096 Sep 26 12:47 osd.84 drwx------ 2 167 167 4096 Sep 26 12:47 osd.85 drwx------ 2 167 167 4096 Sep 26 12:47 osd.86 drwx------ 2 167 167 4096 Sep 26 12:47 osd.87 drwx------ 2 167 167 4096 Sep 26 12:47 osd.89 drwx------ 2 167 167 4096 Sep 26 12:47 osd.90 drwx------ 2 167 167 4096 Sep 26 12:47 osd.91 drwx------ 2 167 167 4096 Sep 26 12:47 osd.92 drwx------ 2 167 167 4096 Sep 26 12:47 osd.93 drwx------ 6 root root 4096 Sep 23 15:59 removed In our case the osd.88 directory is under the subdirectory named “removed”, the same as the other odds which have been converted. ceph-osd31.prod.os:/var/lib/ceph/9b3b3539-59a9-4338-8bab-3badfab6e855# ls -l removed/osd.88_2024-09-23T19\:59\:42.162302Z/ total 64 lrwxrwxrwx 1 167 167 93 Sep 15 12:10 block -> /dev/ceph-2a13ec6a-a5f0-4773-8254-c38b915c824a/osd-block-7f8f9778-5ae2-47c1-bd03-a92a3a7a1db1 -rw------- 1 167 167 37 Sep 15 12:10 ceph_fsid -rw------- 1 167 167 259 Sep 14 15:14 config -rw------- 1 167 167 37 Sep 15 12:10 fsid -rw------- 1 167 167 56 Sep 15 12:10 keyring -rw------- 1 167 167 6 Sep 15 12:10 ready -rw------- 1 167 167 3 Sep 14 11:11 require_osd_release -rw------- 1 167 167 10 Sep 15 12:10 type -rw------- 1 167 167 38 Sep 14 15:14 unit.configured -rw------- 1 167 167 48 Sep 14 15:14 unit.created -rw------- 1 167 167 26 Sep 14 15:06 unit.image -rw------- 1 167 167 76 Sep 14 15:06 unit.meta -rw------- 1 167 167 1527 Sep 14 15:06 unit.poststop -rw------- 1 167 167 2586 Sep 14 15:06 unit.run -rw------- 1 167 167 334 Sep 14 15:06 unit.stop -rw------- 1 167 167 3 Sep 15 12:10 whoami On Sep 27, 2024, at 9:30 AM, Eugen Block <eblock@xxxxxx> wrote: EXTERNAL EMAIL | USE CAUTION Oh interesting, I just got into the same situation (I believe) on a test cluster: host1:~ # ceph orch ps | grep unknown osd.1 host6 stopped 72s ago 36m - 4096M <unknown> <unknown> <unknown> osd.13 host6 error 72s ago 36m - 4096M <unknown> <unknown> <unknown> I still had the remainders on the filesystem: host6:~ # ll /var/lib/ceph/543967bc-e586-32b8-bd2c-2d8b8b168f02/osd.1 insgesamt 68 lrwxrwxrwx 1 ceph ceph 111 27. Sep 14:43 block -> /dev/mapper/ceph--0e90997f--456e--4a9b--a8f9--a6f1038c1216-osd--block--81e7f32a--a728--4848--b14d--0b86bb7e1c69 lrwxrwxrwx 1 ceph ceph 108 27. Sep 14:43 block.db -> /dev/mapper/ceph--9ea6e95f--ad43--4e40--8920--2e772b2efa2f-osd--db--f9c57ec1--77c8--4d9a--85df--1dc053a24000 I just removed those two directories to clear the warning, now my orchestrator can deploy OSDs again on that node. Hope that helps! Zitat von Eugen Block <eblock@xxxxxx>: Right, if you need encryption, a rebuild is required. Your procedure has already worked 4 times, so I'd say nothing seems wrong with that per se. Regarding the stuck device list, do you see the mgr logging anything suspicious? Especially when you say that it only returns output after a failover. Those two osd specs are not conflicting since the first is "unmanaged" after adoption. Is there something in 'ceph orch osd rm status'? Can you run 'cephadm ceph-volume inventory' locally on that node? Do you see any hints in the node's syslog? Maybe try a reboot or something? Zitat von Bob Gibson <rjg@xxxxxxxxxx>: Thanks for your reply Eugen. I’m fairly new to cephadm so I wasn’t aware that we could manage the drives without rebuilding them. However, we thought we’d take advantage of this opportunity to also encrypt the drives, and that does require a rebuild. I have a theory on why the orchestrator is confused. I want to create an osd service for each osd node so I can manage drives on a per-node basis. I started by creating a spec for the first node: service_type: osd service_id: ceph-osd31 placement: hosts: - ceph-osd31 spec: data_devices: rotational: 0 size: '3TB:' encrypted: true filter_logic: AND objectstore: bluestore But I also see a default spec, “osd”, which has placement set to “unmanaged”. `ceph orch ls osd —export` shows the following: service_type: osd service_name: osd unmanaged: true spec: filter_logic: AND objectstore: bluestore --- service_type: osd service_id: ceph-osd31 service_name: osd.ceph-osd31 placement: hosts: - ceph-osd31 spec: data_devices: rotational: 0 size: '3TB:' encrypted: true filter_logic: AND objectstore: bluestore `ceph orch ls osd` shows that I was able to convert 4 drives using my spec: NAME PORTS RUNNING REFRESHED AGE PLACEMENT osd 95 10m ago - <unmanaged> osd.ceph-osd31 4 10m ago 43m ceph-osd31 Despite being able to convert 4 drives, I’m wondering if these specs are conflicting with one another, and that has confused the orchestrator. If so, how do I safely get from where I am now to where I want to be? :-) Cheers, /rjg On Sep 26, 2024, at 3:31 PM, Eugen Block <eblock@xxxxxx> wrote: EXTERNAL EMAIL | USE CAUTION Hi, this seems a bit unnecessary to rebuild OSDs just to get them managed. If you apply a spec file that targets your hosts/OSDs, they will appear as managed. So when you would need to replace a drive, you could already utilize the orchestrator to remove and zap the drive. That works just fine. How to get out of your current situation is not entirely clear to me yet. I’ll reread your post tomorrow. Regards, Eugen Zitat von Bob Gibson <rjg@xxxxxxxxxx>: Hi, We recently converted a legacy cluster running Quincy v17.2.7 to cephadm. The conversion went smoothly and left all osds unmanaged by the orchestrator as expected. We’re now in the process of converting the osds to be managed by the orchestrator. We successfully converted a few of them, but then the orchestrator somehow got confused. `ceph health detail` reports a “stray daemon” for the osd we’re trying to convert, and the orchestrator is unable to refresh its device list so it doesn’t see any available devices.