Hi Tim, On Mon, 15 Jul 2024 at 07:51, Eugen Block <eblock@xxxxxx> wrote: > If the OSD is already running in a container, adopting it won't work, > as you already noticed. I don't have an explanation how the > non-cephadm systemd unit has been created, but that should be fixed by > disabling it. > > > I have considered simply doing a brute-force removal of the OSD > > files in /var/lib/ceph/osd but I'm not sure what ill effects might > > ensue. I discovered that my other offending machine actually has TWO > > legacy OSD directories, but only one of them is being used. The > > other OSD is the remnant of a deletion and it's just dead files now. > > Check which OSDs are active and remove the remainders of the orphaned > directories, that should be fine. But be careful and check properly > before actually remocing anything and only remove one by one while > watching the cluster status. > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>: > > > OK. Phantom hosts are gone. Many thanks! I'll have to review my > > checklist for decomissioning hosts to make sure that step is on it. > > > > On the legacy/container OSD stuff, that is a complete puzzle. > > > > While the first thing that I see when I look up "creating an OSD" in > > the system documentation is the manual process, I've been using > > cephadm long enough to know to dig past that. The manual process is > > sufficiently tedious that I cannot think that I'd have used it by > > accident. Especially since I set out with the explicit goal of using > > cephadm. Yet here it is. This isn't an upgraded machine, it was > > constructed within the last week from the ground up. So I have no > > idea how the legacy definition got there. On two separate systems. > > > > The disable on the legacy OSD worked and the container is now > > running. Although I'm not sure that it will survive a reboot, since > > the legacy service is dynamically created on each reboot. > > > > This is what happens when I try to adopt: > > > > cephadm adopt --style legacy --name osd.4 > > Pulling container image quay.io/ceph/ceph:v16... > > Found online OSD at //var/lib/ceph/osd/ceph-4/fsid > > objectstore_type is bluestore > > Disabling old systemd unit ceph-osd@4... > > Moving data... > > Traceback (most recent call last): > > File "/usr/sbin/cephadm", line 9509, in <module> > > main() > > File "/usr/sbin/cephadm", line 9497, in main > > r = ctx.func(ctx) > > File "/usr/sbin/cephadm", line 2061, in _default_image > > return func(ctx) > > File "/usr/sbin/cephadm", line 6043, in command_adopt > > command_adopt_ceph(ctx, daemon_type, daemon_id, fsid) > > File "/usr/sbin/cephadm", line 6210, in command_adopt_ceph > > move_files(ctx, glob(os.path.join(data_dir_src, '*')), > > File "/usr/sbin/cephadm", line 2215, in move_files > > os.symlink(src_rl, dst_file) > > FileExistsError: [Errno 17] File exists: '/dev/vg_ceph/ceph0504' -> > > '/var/lib/ceph/278fcd86-0861-11ee-a7df-9c5c8e86cf8f/osd.4/block' > > > > I have considered simply doing a brute-force removal of the OSD > > files in /var/lib/ceph/osd but I'm not sure what ill effects might > > ensue. I discovered that my other offending machine actually has TWO > > legacy OSD directories, but only one of them is being used. The > > other OSD is the remnant of a deletion and it's just dead files now. > > By any chance that ceph was accidentally installed on those systems? As it sounds very much like systemd is activating the OSDs through ceph-volume. Another maybe related issue if seen while upgrading (some time ago), the FSID changed somehow and I ended up with OSDs that had the wrong FSID. Cheers, Alwin _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx