Re: Schödinger's OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tim,

On Mon, 15 Jul 2024 at 07:51, Eugen Block <eblock@xxxxxx> wrote:

> If the OSD is already running in a container, adopting it won't work,
> as you already noticed. I don't have an explanation how the
> non-cephadm systemd unit has been created, but that should be fixed by
> disabling it.
>
> > I have considered simply doing a brute-force removal of the OSD
> > files in /var/lib/ceph/osd but I'm not sure what ill effects might
> > ensue. I discovered that my other offending machine actually has TWO
> > legacy OSD directories, but only one of them is being used. The
> > other OSD is the remnant of a deletion and it's just dead files now.
>
> Check which OSDs are active and remove the remainders of the orphaned
> directories, that should be fine. But be careful and check properly
> before actually remocing anything and only remove one by one while
> watching the cluster status.
>
> Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
>
> > OK. Phantom hosts are gone. Many thanks! I'll have to review my
> > checklist for decomissioning hosts to make sure that step is on it.
> >
> > On the legacy/container OSD stuff, that is a complete puzzle.
> >
> > While the first thing that I see when I look up "creating an OSD" in
> > the system documentation is the manual process, I've been using
> > cephadm long enough to know to dig past that. The manual process is
> > sufficiently tedious that I cannot think that I'd have used it by
> > accident. Especially since I set out with the explicit goal of using
> > cephadm. Yet here it is. This isn't an upgraded machine, it was
> > constructed within the last week from the ground up. So I have no
> > idea how the legacy definition got there. On two separate systems.
> >
> > The disable on the legacy OSD worked and the container is now
> > running. Although I'm not sure that it will survive a reboot, since
> > the legacy service is dynamically created on each reboot.
> >
> > This is what happens when I try to adopt:
> >
> >  cephadm adopt --style legacy --name osd.4
> > Pulling container image quay.io/ceph/ceph:v16...
> > Found online OSD at //var/lib/ceph/osd/ceph-4/fsid
> > objectstore_type is bluestore
> > Disabling old systemd unit ceph-osd@4...
> > Moving data...
> > Traceback (most recent call last):
> >   File "/usr/sbin/cephadm", line 9509, in <module>
> >     main()
> >   File "/usr/sbin/cephadm", line 9497, in main
> >     r = ctx.func(ctx)
> >   File "/usr/sbin/cephadm", line 2061, in _default_image
> >     return func(ctx)
> >   File "/usr/sbin/cephadm", line 6043, in command_adopt
> >     command_adopt_ceph(ctx, daemon_type, daemon_id, fsid)
> >   File "/usr/sbin/cephadm", line 6210, in command_adopt_ceph
> >     move_files(ctx, glob(os.path.join(data_dir_src, '*')),
> >   File "/usr/sbin/cephadm", line 2215, in move_files
> >     os.symlink(src_rl, dst_file)
> > FileExistsError: [Errno 17] File exists: '/dev/vg_ceph/ceph0504' ->
> > '/var/lib/ceph/278fcd86-0861-11ee-a7df-9c5c8e86cf8f/osd.4/block'
> >
> > I have considered simply doing a brute-force removal of the OSD
> > files in /var/lib/ceph/osd but I'm not sure what ill effects might
> > ensue. I discovered that my other offending machine actually has TWO
> > legacy OSD directories, but only one of them is being used. The
> > other OSD is the remnant of a deletion and it's just dead files now.
>
> By any chance that ceph was accidentally installed on those systems? As it
sounds very much like systemd is activating the OSDs through ceph-volume.

Another maybe related issue if seen while upgrading (some time ago), the
FSID changed somehow and I ended up with OSDs that had the wrong FSID.

Cheers,
Alwin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux