Re: Schödinger's OSD

Tim Holloway <timh@xxxxxxxxxxxxx> · Mon, 15 Jul 2024 17:48:30 -0400

The problem with merely disabling or masking the non-cephadm OSD is
that the offending systemd service unit lives under /run, not under
/lib/systemd or /etc/systemd.

As far as I know, essentially the entire /run directory's contents get
destroyed when you reboot and that would include the disabled OSD unit.
Then a new copy would get created as the system boot proceeded. I
could, of course, then re-disable it, but that's not a very pretty
solution. Better to determine why Ceph feels the need to create this
Systemd service dynamically and persuade it not to.

I was kind of hoping that it came from finding that OSD directory that
under /var/lib/ceph/osd, but as I said, I have another machine with TWO
such directories and only one manifests as a systemd service. The other
doesn't run at all, doesn't list on an osd tree, orch ps or dashboard,
and since as far as I'm concerned doesn't exist anyway, I'll not
complain about that. I just need to get the invalid stuff excised
safely.

Oh wait, one difference between the two /var/lib/cep/osd's is that the
one that's running has files, the one that isn't is just an empty
directory. Which suggests that the cue for making the /run/ceph/osd
service may be the detection of one of the files there and maybe I
could risk ripping the unwanted directory out. I thinks there are some
softlinks, though, so I'll proceed with caution.

On the plus side, the autotuner seems to have finally kicked in. First
time I've seen "HEALTH OK" in a while!

Alwin: Thanks for your interest. Both the funny machines are dedicated
ceph host nodes, so it's no accident that cephadm is installed on the.
And I've never had more than 1 fsid, so no issue there.

If you're thinking about the "phantom host", that was just because of a
typing error when adding a new ceph host. That problem has now been
resolved.

  Tim

On Mon, 2024-07-15 at 05:49 +0000, Eugen Block wrote:
> If the OSD is already running in a container, adopting it won't
> work,  
> as you already noticed. I don't have an explanation how the  
> non-cephadm systemd unit has been created, but that should be fixed
> by  
> disabling it.
> 
> > I have considered simply doing a brute-force removal of the OSD  
> > files in /var/lib/ceph/osd but I'm not sure what ill effects might 
> > ensue. I discovered that my other offending machine actually has
> > TWO  
> > legacy OSD directories, but only one of them is being used. The  
> > other OSD is the remnant of a deletion and it's just dead files
> > now.
> 
> Check which OSDs are active and remove the remainders of the
> orphaned  
> directories, that should be fine. But be careful and check properly  
> before actually remocing anything and only remove one by one while  
> watching the cluster status.
> 
> Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> 
> > OK. Phantom hosts are gone. Many thanks! I'll have to review my  
> > checklist for decomissioning hosts to make sure that step is on it.
> > 
> > On the legacy/container OSD stuff, that is a complete puzzle.
> > 
> > While the first thing that I see when I look up "creating an OSD"
> > in  
> > the system documentation is the manual process, I've been using  
> > cephadm long enough to know to dig past that. The manual process
> > is  
> > sufficiently tedious that I cannot think that I'd have used it by  
> > accident. Especially since I set out with the explicit goal of
> > using  
> > cephadm. Yet here it is. This isn't an upgraded machine, it was  
> > constructed within the last week from the ground up. So I have no  
> > idea how the legacy definition got there. On two separate systems.
> > 
> > The disable on the legacy OSD worked and the container is now  
> > running. Although I'm not sure that it will survive a reboot,
> > since  
> > the legacy service is dynamically created on each reboot.
> > 
> > This is what happens when I try to adopt:
> > 
> >  cephadm adopt --style legacy --name osd.4
> > Pulling container image quay.io/ceph/ceph:v16...
> > Found online OSD at //var/lib/ceph/osd/ceph-4/fsid
> > objectstore_type is bluestore
> > Disabling old systemd unit ceph-osd@4...
> > Moving data...
> > Traceback (most recent call last):
> >   File "/usr/sbin/cephadm", line 9509, in <module>
> >     main()
> >   File "/usr/sbin/cephadm", line 9497, in main
> >     r = ctx.func(ctx)
> >   File "/usr/sbin/cephadm", line 2061, in _default_image
> >     return func(ctx)
> >   File "/usr/sbin/cephadm", line 6043, in command_adopt
> >     command_adopt_ceph(ctx, daemon_type, daemon_id, fsid)
> >   File "/usr/sbin/cephadm", line 6210, in command_adopt_ceph
> >     move_files(ctx, glob(os.path.join(data_dir_src, '*')),
> >   File "/usr/sbin/cephadm", line 2215, in move_files
> >     os.symlink(src_rl, dst_file)
> > FileExistsError: [Errno 17] File exists: '/dev/vg_ceph/ceph0504' -
> > >  
> > '/var/lib/ceph/278fcd86-0861-11ee-a7df-9c5c8e86cf8f/osd.4/block'
> > 
> > I have considered simply doing a brute-force removal of the OSD  
> > files in /var/lib/ceph/osd but I'm not sure what ill effects might 
> > ensue. I discovered that my other offending machine actually has
> > TWO  
> > legacy OSD directories, but only one of them is being used. The  
> > other OSD is the remnant of a deletion and it's just dead files
> > now.
> > 
> > 
> > 
> > On 7/13/24 02:39, Eugen Block wrote:
> > > Okay, it looks like you just need some further cleanup regarding 
> > > your phantom hosts, for example:
> > > 
> > > ceph osd crush remove www2
> > > ceph osd crush remove docker0
> > > 
> > > and so on.
> > > 
> > > Regarding the systemd unit (well, cephadm also generates one,
> > > but  
> > > with the fsid as already mentioned), you could just stop and  
> > > disable the old one:
> > > 
> > > systemctl disable --now ceph-osd@4
> > > 
> > > and see if the container takes over.
> > > 
> > > Was this your attempt to adopt an existing OSD from pre-cephadm?
> > > 
> > > > ceph orch daemon add osd
> > > > ceph05.internal.mousetech.com:vg_ceph/ceph0504
> > > 
> > > The recommended way would have been to adopt the device:
> > > 
> > > cephadm [--image your-custom-image] adopt --style legacy --name
> > > osd.4
> > > 
> > > locally on that host. The --image parameter is optional. Did you 
> > > follow the docs [1] when you moved to cephadm? Anyway, since it  
> > > somehow seems to work already, it's probably not that relevant  
> > > anymore, I just wanted to point to it anyway.
> > > 
> > > [1] https://docs.ceph.com/en/latest/cephadm/adoption/
> > > 
> > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> > > 
> > > > This particular system has it both ways and neither wants to
> > > > work.
> > > > 
> > > > The peculiar thing was that when I first re-created the OSD
> > > > with
> > > > cephadm, it was reported that this was an "unmanaged node". So
> > > > I ran
> > > > the same cephadm agin and THAT time it showed up. So I suspect
> > > > that the
> > > > ceph-osd@4.service was the first install and the
> > > > ceph-~~~~~@osd.4.servuce got added on the second try.
> > > > 
> > > > ceph orch daemon add osd
> > > > ceph05.internal.mousetech.com:vg_ceph/ceph0504
> > > > 
> > > > Here's the OSD tree:
> > > > 
> > > > ID   CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-
> > > > AFF
> > > >  -1         7.00000  root default
> > > > -25         1.00000      host ceph01
> > > >   1    hdd  1.00000          osd.1         up   1.00000 
> > > > 1.00000
> > > > -28         1.00000      host ceph03
> > > >   3    hdd  1.00000          osd.3         up   1.00000 
> > > > 1.00000
> > > > -11         1.00000      host ceph04
> > > >   7    hdd  1.00000          osd.7         up   1.00000 
> > > > 1.00000
> > > > -31         1.00000      host ceph05
> > > >   4    hdd  1.00000          osd.4         up   1.00000 
> > > > 1.00000
> > > >  -9         1.00000      host ceph06
> > > >   2    hdd  1.00000          osd.2         up   1.00000 
> > > > 1.00000
> > > > -22         1.00000      host dell02
> > > >   5    ssd  1.00000          osd.5         up   1.00000 
> > > > 1.00000
> > > > -13               0      host docker0
> > > >  -5               0      host www2
> > > >  -3               0      host www6
> > > >  -7         1.00000      host www7
> > > >   0    hdd  1.00000          osd.0         up   1.00000 
> > > > 1.00000
> > > > 
> > > > Note that www2 was recommissioned and replaced by the ceph05
> > > > machine
> > > > and no longer physically exists. www6 was the original/admin
> > > > ceph node
> > > > with all the accumulated glop. It ran on the base OS (not in a
> > > > VM) and
> > > > I have not attempted to re-create that. Ceph remembers it from
> > > > its own
> > > > internal memory, as the original www6 system drive was replaced
> > > > with an
> > > > SSD and the OS was installed from scratch. The docker0 machine
> > > > was
> > > > never a ceph node. It's the phantom that got yanked it because
> > > > I
> > > > accidentally supplied its IP address when doing a ceph orch
> > > > host add
> > > > for a completely different machine.
> > > > 
> > > > And here's the host list as the orchestrator sees it:
> > > > 
> > > > ceph orch host ls
> > > > HOST                           ADDR       LABELS      STATUS
> > > > ceph01.internal.mousetech.com  10.0.1.21
> > > > ceph03.internal.mousetech.com  10.0.1.53
> > > > ceph04.mousetech.com           10.0.1.14
> > > > ceph05.internal.mousetech.com  10.0.1.54
> > > > ceph06.internal.mousetech.com  10.0.1.56
> > > > dell02.mousetech.com           10.0.1.52  _admin rgw
> > > > www7.mousetech.com             10.0.1.7   rgw
> > > > 7 hosts in cluster
> > > > 
> > > > 
> > > > On Fri, 2024-07-12 at 22:15 +0000, Eugen Block wrote:
> > > > > Hi,
> > > > > 
> > > > > containerized daemons usually have the fsid in the systemd
> > > > > unit,
> > > > > like
> > > > > ceph-{fsid}@osd.5
> > > > > 
> > > > > Is it possible that you have those confused? Check the
> > > > > /var/lib/ceph/osd/ directory to find possible orphaned
> > > > > daemons and
> > > > > clean them up.
> > > > > And as previously stated, it would help to see your osd tree
> > > > > and
> > > > > which
> > > > > OSDs you’re talking about.
> > > > > 
> > > > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> > > > > 
> > > > > > Incidentally, I just noticed that my phantom host isn't
> > > > > > completely
> > > > > > gone. It's not in the host list, either command-line or
> > > > > > dashboard,
> > > > > > but
> > > > > > it does list (with no assets) as a host under "ceph osd
> > > > > > tree".
> > > > > > 
> > > > > > ---
> > > > > > 
> > > > > > More seriously, I've been having problems with OSDs that
> > > > > > report as
> > > > > > being both up and down at the same time.
> > > > > > 
> > > > > > This is on 2 new hosts. One host saw this when I made it
> > > > > > the _admin
> > > > > > host. The other caught it because it's running in a VM with
> > > > > > the OSD
> > > > > > mapped out as an imported disk and the host os managed to
> > > > > > flip
> > > > > > which
> > > > > > drive was sda and which was sbd, resulting in having to
> > > > > > delete and
> > > > > > re-
> > > > > > define the OSD in the VM.
> > > > > > 
> > > > > > But now the OSD on this VM reports as "UP/IN" on the
> > > > > > dashboard
> > > > > > while
> > > > > > it's "error" on "ceph orch ps" and on the actual vbox, the
> > > > > > osd
> > > > > > container fails on startup. viz:
> > > > > > 
> > > > > > ul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-
> > > > > > 0861-
> > > > > > 11ee-
> > > > > > a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-
> > > > > > 12T20:06:48.056+0000
> > > > > > 7fc17dfb9380 -1 bdev(0x55e4853c4800 /var/lib/ceph/osd/ceph-
> > > > > > 4/block)
> > > > > > open open got: (16) De>
> > > > > > Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-
> > > > > > 278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-
> > > > > > 12T20:06:48.056+0000
> > > > > > 7fc17dfb9380 -1 osd.4 0 OSD:init: unable to mount object
> > > > > > store
> > > > > > Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-
> > > > > > 278fcd86-0861-
> > > > > > 11ee-
> > > > > > a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-
> > > > > > 12T20:06:48.056+0000
> > > > > > 7fc17dfb9380 -1  ** ERROR: osd init failed: (16) Device or
> > > > > > resource
> > > > > > busy
> > > > > > 
> > > > > > Note that truncated message above reads
> > > > > > 
> > > > > > bdev(0x55e4853c4800 /var/lib/ceph/osd/ceph-4/block) open
> > > > > > open got:
> > > > > > (16)
> > > > > > Device or resource busy
> > > > > > 
> > > > > > Rebooting doesn't help. Nor does freeing us resources and
> > > > > > stopping/starting processes manually. The problem
> > > > > > eventually
> > > > > > cleared up
> > > > > > spontaneously on the admin box, but I have no idea why.
> > > > > > 
> > > > > > ---
> > > > > > 
> > > > > > Also noted that now the OSD on the admin box shows in ceph
> > > > > > orch ps
> > > > > > as
> > > > > > "stopped", though again, the dashboard lists it as "UP/IN".
> > > > > > 
> > > > > > Here's what systemctl thinks about it:
> > > > > > 
> > > > > > systemctl status ceph-osd@5.service
> > > > > > ● ceph-osd@5.service - Ceph object storage daemon osd.5
> > > > > >      Loaded: loaded (/usr/lib/systemd/system/ceph-
> > > > > > osd@.service;
> > > > > > enabled-runtime; preset: disabled)
> > > > > >      Active: active (running) since Fri 2024-07-12 16:45:51
> > > > > > EDT;
> > > > > > 1min
> > > > > > 40s ago
> > > > > >     Process: 8511 ExecStartPre=/usr/libexec/ceph/ceph-osd-
> > > > > > prestart.sh -
> > > > > > -cluster ${CLUSTER} --id 5 (code=exited, status=0/SUCCESS)
> > > > > >    Main PID: 8517 (ceph-osd)
> > > > > >       Tasks: 70
> > > > > >      Memory: 478.6M
> > > > > >         CPU: 3.405s
> > > > > >      CGroup: /system.slice/system-
> > > > > > ceph\x2dosd.slice/ceph-osd@5.service
> > > > > >              └─8517 /usr/bin/ceph-osd -f --cluster ceph --
> > > > > > id 5 --
> > > > > > setuser ceph --setgroup ceph
> > > > > > 
> > > > > > Jul 12 16:45:51 dell02.mousetech.com systemd[1]: Starting
> > > > > > Ceph
> > > > > > object
> > > > > > storage daemon osd.5...
> > > > > > Jul 12 16:45:51 dell02.mousetech.com systemd[1]: Started
> > > > > > Ceph
> > > > > > object
> > > > > > storage daemon osd.5.
> > > > > > Jul 12 16:45:51 dell02.mousetech.com ceph-osd[8517]: 2024-
> > > > > > 07-
> > > > > > 12T16:45:51.642-0400 7f2bd440c140 -1 Falling back to public
> > > > > > interface
> > > > > > Jul 12 16:45:58 dell02.mousetech.com ceph-osd[8517]: 2024-
> > > > > > 07-
> > > > > > 12T16:45:58.352-0400 7f2bd440c140 -1 osd.5 34161
> > > > > > log_to_monitors
> > > > > > {default=true}
> > > > > > Jul 12 16:45:59 dell02.mousetech.com ceph-osd[8517]: 2024-
> > > > > > 07-
> > > > > > 12T16:45:59.206-0400 7f2bcbbf0640 -1 osd.5 34161
> > > > > > set_numa_affinity
> > > > > > unable to identify public interface '' numa node: (2) No
> > > > > > such file
> > > > > > or
> > > > > > directory
> > > > > > 
> > > > > > The actual container is not running.
> > > > > > 
> > > > > > Ceph version, incidentally, is 16.2.15. Except for that one
> > > > > > node
> > > > > > that
> > > > > > apparently didn't move up from Octupus (I'll be nuking that
> > > > > > one
> > > > > > shortly).
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > 
> > > 
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx