Take care when reading the output of "ceph osd metadata". When you are
running the OSD as an administered service, it's running in a container,
and a container is a miniature VM. So, for example, it may report your
OS as "CentOS Stream 8" even if your actual machine is running Ubuntu.
The biggest pitfall is in paths, because in certain cases - definitely
for OSDs - internally the path for the OSD metadata and data store will
be /var/lib/ceph/osd, but the actual path in the machine's OS will be
/var/lib/ceph/<fsid>/osd, where the container simply mounts that for its
internal path.
In other words, "ceph osd metadata" formulates its reports by having the
containers assemble the report data and the output is thus the OSD's
internal view, not your server's view.
Tim
On 10/28/24 14:01, Dave Hall wrote:
Hello.
Thanks to Rober's reply to 'Influencing the osd.id <http://osd.id>'
I've learned two new commands today. I can now see that 'ceph osd
metadata' confirms that I have two OSDs pointing to the same physical
disk name:
root@ceph09:/# ceph osd metadata 12 | grep sdi
"bluestore_bdev_devices": "sdi",
"device_ids":
"nvme0n1=SAMSUNG_MZPLL1T6HEHP-00003_S3HBNA0KA03264,sdi=SEAGATE_ST12000NM0027_*ZJV5TX47*0000C9470ZWA",
"device_paths":
"nvme0n1=/dev/disk/by-path/pci-0000:83:00.0-nvme-1,sdi=/dev/disk/by-path/pci-0000:41:00.0-sas-phy18-lun-0",
"devices": "nvme0n1,sdi",
"objectstore_numa_unknown_devices": "nvme0n1,sdi",
root@ceph09:/# ceph osd metadata 9 | grep sdi
"bluestore_bdev_devices": "sdi",
"device_ids":
"nvme1n1=Samsung_SSD_983_DCT_M.2_1.92TB_S48DNC0N701016D,sdi=SEAGATE_ST12000NM0027_*ZJV5SMTQ*0000C9128FE0",
"device_paths":
"nvme1n1=/dev/disk/by-path/pci-0000:01:00.0-nvme-1,sdi=/dev/disk/by-path/pci-0000:41:00.0-sas-phy6-lun-0",
"devices": "nvme1n1,sdi",
"objectstore_numa_unknown_devices": "sdi",
However, even though OSD 12 is saying sdi, at least it is pointing to
the serial number of the failed disk. However, the disk with that
serial number is currently residing at /dev/sdc.
Is there a way to force the record for the destroyed OSD to point to
/dev/sdc?
Thanks.
-Dave
--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx
On Mon, Oct 28, 2024 at 11:47 AM Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:
Hello.
The following is on a Reef Podman installation:
In attempting to deal over the weekend with a failed OSD disk, I
have somehow managed to have two OSDs pointing to the same HDD, as
shown below.
image.png
To be sure, the failure occurred on OSD.12, which was pointing to
/dev/sdi.
I disabled the systemd unit for OSD.12 because it kept
restarting. I then destroyed it.
When I physically removed the failed disk and rebooted the system,
the disk enumeration changed. So, before the reboot, OSD.12 was
using /dev/sdi. After the reboot, OSD.9 moved to /dev/sdi.
I didn't know that I had an issue until 'ceph-volume lvm prepare'
failed. It was in the process of investigating this that I found
the above. Right now I have reinserted the failed disk and
rebooted, hoping that OSD.12 would find its old disk by some other
means, but no joy.
My concern is that if I run 'ceph osd rm' I could take out OSD.9.
I could take the precaution of marking OSD.9 out and let it drain,
but I'd rather not. I am, perhaps, more inclined to manually
clear the lingering configuration associated with OSD.12 if
someone could send me the list of commands. Otherwise, I'm open to
suggestions.
Thanks.
-Dave
--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx