Re: Created no osd(s) on host, already created?

Gustavo Garcia Rondina <grondina@xxxxxxxxxxxx> · Fri, 7 Mar 2025 15:06:28 +0000

Hi Eugen,

>From: Eugen Block <eblock@xxxxxx>
>Sent: Friday, March 7, 2025 1:21 AM
>
>
>can you show the output of 'ceph orch ls osd --export'?

# ceph orch ls osd --export
service_type: osd
service_id: osd_spec
service_name: osd.osd_spec
placement:
  host_pattern: node-osd5
spec:
  data_devices:
    rotational: 1
  db_devices:
    rotational: 0
  filter_logic: AND
  objectstore: bluestore

>I would look 
>in the cephadm.log and ceph-volume.log on that node as well as in the 
>active mgr log.

The cephadm.log indicates a race condition while zapping:

2025-03-06 16:34:31,728 7f033324db80 INFO /usr/bin/podman: stderr --> Zapping: /dev/sdah
2025-03-06 16:34:31,728 7f033324db80 INFO /usr/bin/podman: stderr  stderr: wipefs: error: /dev/sdah: probing initialization failed: Device or resource busy
2025-03-06 16:34:31,728 7f033324db80 INFO /usr/bin/podman: stderr --> failed to wipefs device, will try again to workaround probable race condition
...
2025-03-06 16:34:31,728 7f033324db80 INFO /usr/bin/podman: stderr Traceback (most recent call last):
2025-03-06 16:34:31,729 7f033324db80 INFO /usr/bin/podman: stderr RuntimeError: could not complete wipefs on device: /dev/sdah

There are similar logs in ceph-volume.log. Looking at the
ceph-osd.2.log around the same time, it seems that osd.2
was being created, but that's the last of it:

2025-03-06T14:38:57.605-0600 7fc9a6185540  4 rocksdb: [db/db_impl/db_impl.cc:446] Shutdown: canceling all background work
2025-03-06T14:38:57.606-0600 7fc9a6185540  4 rocksdb: [db/db_impl/db_impl.cc:625] Shutdown complete
2025-03-06T14:38:57.606-0600 7fc9a6185540  1 bluefs umount
2025-03-06T14:38:57.606-0600 7fc9a6185540  1 bdev(0x55bd46237000 /var/lib/ceph/osd/ceph-2//block) close
2025-03-06T14:38:57.876-0600 7fc9a6185540  1 freelist shutdown
2025-03-06T14:38:57.876-0600 7fc9a6185540  1 bdev(0x55bd46237800 /var/lib/ceph/osd/ceph-2//block) close
2025-03-06T14:38:58.125-0600 7fc9a6185540  0 created object store /var/lib/ceph/osd/ceph-2/ for osd.2 fsid 26315dca-383a-11ee-9d49-00620b4c2392

>If you already have an osd service that would pick up 
>this osd, and you zapped it, its creation might have been interrupted. 

I think that's exactly what happened.

>If you create all your OSDs manually with 'ceph orch daemon add 
osd...' this theory doesn't make much sense.

This cluster was deployed by a contractor some couple of years
ago, and it appears that they have indeed used OSD Spec to create
the OSDs, so I think your theory is right (as is Robert's, in the other
reply).

Do you think zapping the disk with the orchestrator and waiting
for it to be used by an automatic OSD creation would work?

Should I be worried about remains of this OSD that was attempted
to deploy but was interrupted by my manual actions?

Thank you
-
Gustavo
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx