Re: Ceph orchestrator not refreshing device list

Bob Gibson <rjg@xxxxxxxxxx> · Thu, 26 Sep 2024 20:13:00 +0000

Thanks for your reply Eugen. I’m fairly new to cephadm so I wasn’t aware that we could manage the drives without rebuilding them. However, we thought we’d take advantage of this opportunity to also encrypt the drives, and that does require a rebuild.

I have a theory on why the orchestrator is confused. I want to create an osd service for each osd node so I can manage drives on a per-node basis.

I started by creating a spec for the first node:

service_type: osd
service_id: ceph-osd31
placement:
  hosts:
  - ceph-osd31
spec:
  data_devices:
    rotational: 0
    size: '3TB:'
  encrypted: true
  filter_logic: AND
  objectstore: bluestore

But I also see a default spec, “osd”, which has placement set to “unmanaged”.

`ceph orch ls osd —export` shows the following:

service_type: osd
service_name: osd
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: ceph-osd31
service_name: osd.ceph-osd31
placement:
  hosts:
  - ceph-osd31
spec:
  data_devices:
    rotational: 0
    size: '3TB:'
  encrypted: true
  filter_logic: AND
  objectstore: bluestore

`ceph orch ls osd` shows that I was able to convert 4 drives using my spec:

NAME            PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd                         95  10m ago    -    <unmanaged>
osd.ceph-osd31               4  10m ago    43m  ceph-osd31

Despite being able to convert 4 drives, I’m wondering if these specs are conflicting with one another, and that has confused the orchestrator. If so, how do I safely get from where I am now to where I want to be? :-)

Cheers,
/rjg

On Sep 26, 2024, at 3:31 PM, Eugen Block <eblock@xxxxxx> wrote:

EXTERNAL EMAIL | USE CAUTION

Hi,

this seems a bit unnecessary to rebuild OSDs just to get them managed.
If you apply a spec file that targets your hosts/OSDs, they will
appear as managed. So when you would need to replace a drive, you
could already utilize the orchestrator to remove and zap the drive.
That works just fine.
How to get out of your current situation is not entirely clear to me
yet. I’ll reread your post tomorrow.

Regards,
Eugen

Zitat von Bob Gibson <rjg@xxxxxxxxxx>:

Hi,

We recently converted a legacy cluster running Quincy v17.2.7 to
cephadm. The conversion went smoothly and left all osds unmanaged by
the orchestrator as expected. We’re now in the process of converting
the osds to be managed by the orchestrator. We successfully
converted a few of them, but then the orchestrator somehow got
confused. `ceph health detail` reports a “stray daemon” for the osd
we’re trying to convert, and the orchestrator is unable to refresh
its device list so it doesn’t see any available devices.