Re: Destroyed OSD clinging to wrong disk

Dave Hall <kdhall@xxxxxxxxxxxxxx> · Tue, 29 Oct 2024 22:37:44 -0400

Tim,

Thank you for your guidance.  Your points are completely understood.  It
was more that I couldn't figure out why the Dashboard was telling me that
the destroyed OSD was still using /dev/sdi when the physical disk with that
serial number was at /dev/sdc, and when another OSD was also reporting
/dev/sdi.  I figured that there must be some information buried somewhere.
I don't know where this metadata comes from or how it gets updated when
things like 'drive letters' change, but the metadata matched what the
dashboard showed, so now I know something new.

Regarding the process for bringing the OSD back online with a new HDD, I am
still having some difficulties.  I used the steps in the Adding/Removing
OSDs document under Removing the OSD, and the OSD mostly appears to be
gone.  However, attempts to use 'ceph-volume lvm prepare' to build the
remplacement OSD are failing,   Same thing with 'ceph orch daemon add
osd'.

I think the problem might be that the NVMe LV that was the WAL/DB for the
failed OSD did not get cleaned up, but on my systems 4 OSDs use the same
NVMe drive for WAL/DB, so I'm not sure how to proceed.

Any suggestions would be welcome.

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx

On Tue, Oct 29, 2024 at 3:13 PM Tim Holloway <timh@xxxxxxxxxxxxx> wrote:

> Take care when reading the output of "ceph osd metadata". When you are
> running the OSD as an administered service, it's running in a container,
> and a container is a miniature VM. So, for example, it may report your
> OS as "CentOS Stream 8" even if your actual machine is running Ubuntu.
>
>
> The biggest pitfall is in paths, because in certain cases - definitely
> for OSDs - internally the path for the OSD metadata and data store will
> be /var/lib/ceph/osd, but the actual path in the machine's OS will be
> /var/lib/ceph/<fsid>/osd, where the container simply mounts that for its
> internal path.
>
> In other words, "ceph osd metadata" formulates its reports by having the
> containers assemble the report data and the output is thus the OSD's
> internal view, not your server's view.
>
>     Tim
>
>
> On 10/28/24 14:01, Dave Hall wrote:
> > Hello.
> >
> > Thanks to Rober's reply to 'Influencing the osd.id <http://osd.id>'
> > I've learned two new commands today.  I can now see that 'ceph osd
> > metadata'  confirms that I have two OSDs pointing to the same physical
> > disk name:
> >
> >     root@ceph09:/# ceph osd metadata 12 | grep sdi
> >         "bluestore_bdev_devices": "sdi",
> >         "device_ids":
> >
>  "nvme0n1=SAMSUNG_MZPLL1T6HEHP-00003_S3HBNA0KA03264,sdi=SEAGATE_ST12000NM0027_*ZJV5TX47*0000C9470ZWA",
> >         "device_paths":
> >
>  "nvme0n1=/dev/disk/by-path/pci-0000:83:00.0-nvme-1,sdi=/dev/disk/by-path/pci-0000:41:00.0-sas-phy18-lun-0",
> >         "devices": "nvme0n1,sdi",
> >         "objectstore_numa_unknown_devices": "nvme0n1,sdi",
> >     root@ceph09:/# ceph osd metadata 9 | grep sdi
> >         "bluestore_bdev_devices": "sdi",
> >         "device_ids":
> >
>  "nvme1n1=Samsung_SSD_983_DCT_M.2_1.92TB_S48DNC0N701016D,sdi=SEAGATE_ST12000NM0027_*ZJV5SMTQ*0000C9128FE0",
> >         "device_paths":
> >
>  "nvme1n1=/dev/disk/by-path/pci-0000:01:00.0-nvme-1,sdi=/dev/disk/by-path/pci-0000:41:00.0-sas-phy6-lun-0",
> >         "devices": "nvme1n1,sdi",
> >         "objectstore_numa_unknown_devices": "sdi",
> >
> >
> > However, even though OSD 12 is saying sdi, at least it is pointing to
> > the serial number of the failed disk.  However, the disk with that
> > serial number is currently residing at /dev/sdc.
> >
> > Is there a way to force the record for the destroyed OSD to point to
> > /dev/sdc?
> >
> > Thanks.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdhall@xxxxxxxxxxxxxx
> >
> > On Mon, Oct 28, 2024 at 11:47 AM Dave Hall <kdhall@xxxxxxxxxxxxxx>
> wrote:
> >
> >     Hello.
> >
> >     The following is on a Reef Podman installation:
> >
> >     In attempting to deal over the weekend with a failed OSD disk, I
> >     have somehow managed to have two OSDs pointing to the same HDD, as
> >     shown below.
> >
> >     image.png
> >
> >     To be sure, the failure occurred on OSD.12, which was pointing to
> >     /dev/sdi.
> >
> >     I disabled the systemd unit for OSD.12 because it kept
> >     restarting.  I then destroyed it.
> >
> >     When I physically removed the failed disk and rebooted the system,
> >     the disk enumeration changed.  So, before the reboot, OSD.12 was
> >     using /dev/sdi.  After the reboot, OSD.9 moved to /dev/sdi.
> >
> >     I didn't know that I had an issue until 'ceph-volume lvm prepare'
> >     failed.  It was in the process of investigating this that I found
> >     the above.  Right now I have reinserted the failed disk and
> >     rebooted, hoping that OSD.12 would find its old disk by some other
> >     means, but no joy.
> >
> >     My concern is that if I run 'ceph osd rm' I could take out OSD.9.
> >     I could take the precaution of marking OSD.9 out and let it drain,
> >     but I'd rather not.  I am, perhaps, more inclined to manually
> >     clear the lingering configuration associated with OSD.12 if
> >     someone could send me the list of commands. Otherwise, I'm open to
> >     suggestions.
> >
> >     Thanks.
> >
> >     -Dave
> >
> >     --
> >     Dave Hall
> >     Binghamton University
> >     kdhall@xxxxxxxxxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list --ceph-users@xxxxxxx
> > To unsubscribe send an email toceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx