Hi Eugen, >From: Eugen Block <eblock@xxxxxx> >Sent: Friday, March 7, 2025 1:21 AM > > >can you show the output of 'ceph orch ls osd --export'? # ceph orch ls osd --export service_type: osd service_id: osd_spec service_name: osd.osd_spec placement: host_pattern: node-osd5 spec: data_devices: rotational: 1 db_devices: rotational: 0 filter_logic: AND objectstore: bluestore >I would look >in the cephadm.log and ceph-volume.log on that node as well as in the >active mgr log. The cephadm.log indicates a race condition while zapping: 2025-03-06 16:34:31,728 7f033324db80 INFO /usr/bin/podman: stderr --> Zapping: /dev/sdah 2025-03-06 16:34:31,728 7f033324db80 INFO /usr/bin/podman: stderr stderr: wipefs: error: /dev/sdah: probing initialization failed: Device or resource busy 2025-03-06 16:34:31,728 7f033324db80 INFO /usr/bin/podman: stderr --> failed to wipefs device, will try again to workaround probable race condition ... 2025-03-06 16:34:31,728 7f033324db80 INFO /usr/bin/podman: stderr Traceback (most recent call last): 2025-03-06 16:34:31,729 7f033324db80 INFO /usr/bin/podman: stderr RuntimeError: could not complete wipefs on device: /dev/sdah There are similar logs in ceph-volume.log. Looking at the ceph-osd.2.log around the same time, it seems that osd.2 was being created, but that's the last of it: 2025-03-06T14:38:57.605-0600 7fc9a6185540 4 rocksdb: [db/db_impl/db_impl.cc:446] Shutdown: canceling all background work 2025-03-06T14:38:57.606-0600 7fc9a6185540 4 rocksdb: [db/db_impl/db_impl.cc:625] Shutdown complete 2025-03-06T14:38:57.606-0600 7fc9a6185540 1 bluefs umount 2025-03-06T14:38:57.606-0600 7fc9a6185540 1 bdev(0x55bd46237000 /var/lib/ceph/osd/ceph-2//block) close 2025-03-06T14:38:57.876-0600 7fc9a6185540 1 freelist shutdown 2025-03-06T14:38:57.876-0600 7fc9a6185540 1 bdev(0x55bd46237800 /var/lib/ceph/osd/ceph-2//block) close 2025-03-06T14:38:58.125-0600 7fc9a6185540 0 created object store /var/lib/ceph/osd/ceph-2/ for osd.2 fsid 26315dca-383a-11ee-9d49-00620b4c2392 >If you already have an osd service that would pick up >this osd, and you zapped it, its creation might have been interrupted. I think that's exactly what happened. >If you create all your OSDs manually with 'ceph orch daemon add osd...' this theory doesn't make much sense. This cluster was deployed by a contractor some couple of years ago, and it appears that they have indeed used OSD Spec to create the OSDs, so I think your theory is right (as is Robert's, in the other reply). Do you think zapping the disk with the orchestrator and waiting for it to be used by an automatic OSD creation would work? Should I be worried about remains of this OSD that was attempted to deploy but was interrupted by my manual actions? Thank you - Gustavo _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx