Re: Schödinger's OSD

Tim Holloway <timh@xxxxxxxxxxxxx> · Tue, 16 Jul 2024 08:13:01 -0400

Interesting. And thanks for the info.

I did a quick look-around. The admin node, which is one of the mixed-
osd machines has these packages installed:

centos-release-ceph-pacific-1.0-2.el9.noarch
cephadm-16.2.14-2.el9s.noarch
libcephfs2-16.2.15-1.el9s.x86_64
python3-ceph-common-16.2.15-1.el9s.x86_64
python3-ceph-argparse-16.2.15-1.el9s.x86_64
python3-cephfs-16.2.15-1.el9s.x86_64
libcephsqlite-16.2.15-1.el9s.x86_64
ceph-common-16.2.15-1.el9s.x86_64
ceph-base-16.2.15-1.el9s.x86_64
ceph-selinux-16.2.15-1.el9s.x86_64
ceph-mds-16.2.15-1.el9s.x86_64
ceph-mon-16.2.15-1.el9s.x86_64
ceph-osd-16.2.15-1.el9s.x86_64
ceph-prometheus-alerts-16.2.15-1.el9s.noarch
ceph-grafana-dashboards-16.2.15-1.el9s.noarch
ceph-mgr-dashboard-16.2.15-1.el9s.noarch
ceph-mgr-diskprediction-local-16.2.15-1.el9s.noarch
ceph-mgr-k8sevents-16.2.15-1.el9s.noarch
ceph-mgr-modules-core-16.2.15-1.el9s.noarch
ceph-mgr-16.2.15-1.el9s.x86_64
ceph-mgr-rook-16.2.15-1.el9s.noarch
ceph-16.2.15-1.el9s.x86_64

The other problem node has these:

centos-release-ceph-pacific-1.0-2.el9.noarch
python3-ceph-common-16.2.15-1.el9s.x86_64
libcephfs2-16.2.15-1.el9s.x86_64
python3-ceph-argparse-16.2.15-1.el9s.x86_64
python3-cephfs-16.2.15-1.el9s.x86_64
libcephsqlite-16.2.15-1.el9s.x86_64
ceph-mgr-modules-core-16.2.15-1.el9s.noarch
ceph-common-16.2.15-1.el9s.x86_64
ceph-base-16.2.15-1.el9s.x86_64
ceph-selinux-16.2.15-1.el9s.x86_64
ceph-mgr-diskprediction-local-16.2.15-1.el9s.noarch
ceph-mgr-16.2.15-1.el9s.x86_64
ceph-mds-16.2.15-1.el9s.x86_64
ceph-mon-16.2.15-1.el9s.x86_64
ceph-osd-16.2.15-1.el9s.x86_64
ceph-16.2.15-1.el9s.x86_64
cephadm-16.2.14-2.el9s.noarch

But a simple node hosting only and OSD and NFS did look like this:

centos-release-ceph-pacific-1.0-2.el9.noarch
cephadm-16.2.14-2.el9s.noarch

I did an in-place migration from AlmaLinux 8 to AlamaLinux 9 and that
may have had side-effects. But who knows?

I'll start ripping packages and see what happens.

On Tue, 2024-07-16 at 06:38 +0000, Eugen Block wrote:
> Do you have more ceph packages installed than just cephadm? If you  
> have ceph-osd packages (or ceph-mon, ceph-mds etc.), I would remove  
> them and clean up the directories properly. To me it looks like a  
> mixup of "traditional" package based installation and cephadm  
> deployment. Only you can tell how and why, but it's more important
> to  
> clean that up and keep it consistent. You should keep the cephadm
> and  
> optional the ceph-common package, but the rest isn't required to run
> a  
> cephadm cluster.
> 
> 
> Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> 
> > The problem with merely disabling or masking the non-cephadm OSD is
> > that the offending systemd service unit lives under /run, not under
> > /lib/systemd or /etc/systemd.
> > 
> > As far as I know, essentially the entire /run directory's contents
> > get
> > destroyed when you reboot and that would include the disabled OSD
> > unit.
> > Then a new copy would get created as the system boot proceeded. I
> > could, of course, then re-disable it, but that's not a very pretty
> > solution. Better to determine why Ceph feels the need to create
> > this
> > Systemd service dynamically and persuade it not to.
> > 
> > I was kind of hoping that it came from finding that OSD directory
> > that
> > under /var/lib/ceph/osd, but as I said, I have another machine with
> > TWO
> > such directories and only one manifests as a systemd service. The
> > other
> > doesn't run at all, doesn't list on an osd tree, orch ps or
> > dashboard,
> > and since as far as I'm concerned doesn't exist anyway, I'll not
> > complain about that. I just need to get the invalid stuff excised
> > safely.
> > 
> > Oh wait, one difference between the two /var/lib/cep/osd's is that
> > the
> > one that's running has files, the one that isn't is just an empty
> > directory. Which suggests that the cue for making the /run/ceph/osd
> > service may be the detection of one of the files there and maybe I
> > could risk ripping the unwanted directory out. I thinks there are
> > some
> > softlinks, though, so I'll proceed with caution.
> > 
> > On the plus side, the autotuner seems to have finally kicked in.
> > First
> > time I've seen "HEALTH OK" in a while!
> > 
> > Alwin: Thanks for your interest. Both the funny machines are
> > dedicated
> > ceph host nodes, so it's no accident that cephadm is installed on
> > the.
> > And I've never had more than 1 fsid, so no issue there.
> > 
> > If you're thinking about the "phantom host", that was just because
> > of a
> > typing error when adding a new ceph host. That problem has now been
> > resolved.
> > 
> >   Tim
> > 
> > On Mon, 2024-07-15 at 05:49 +0000, Eugen Block wrote:
> > > If the OSD is already running in a container, adopting it won't
> > > work, 
> > > as you already noticed. I don't have an explanation how the 
> > > non-cephadm systemd unit has been created, but that should be
> > > fixed
> > > by 
> > > disabling it.
> > > 
> > > > I have considered simply doing a brute-force removal of the
> > > > OSD 
> > > > files in /var/lib/ceph/osd but I'm not sure what ill effects
> > > > might 
> > > > ensue. I discovered that my other offending machine actually
> > > > has
> > > > TWO 
> > > > legacy OSD directories, but only one of them is being used.
> > > > The 
> > > > other OSD is the remnant of a deletion and it's just dead files
> > > > now.
> > > 
> > > Check which OSDs are active and remove the remainders of the
> > > orphaned 
> > > directories, that should be fine. But be careful and check
> > > properly 
> > > before actually remocing anything and only remove one by one
> > > while 
> > > watching the cluster status.
> > > 
> > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> > > 
> > > > OK. Phantom hosts are gone. Many thanks! I'll have to review
> > > > my 
> > > > checklist for decomissioning hosts to make sure that step is on
> > > > it.
> > > > 
> > > > On the legacy/container OSD stuff, that is a complete puzzle.
> > > > 
> > > > While the first thing that I see when I look up "creating an
> > > > OSD"
> > > > in 
> > > > the system documentation is the manual process, I've been
> > > > using 
> > > > cephadm long enough to know to dig past that. The manual
> > > > process
> > > > is 
> > > > sufficiently tedious that I cannot think that I'd have used it
> > > > by 
> > > > accident. Especially since I set out with the explicit goal of
> > > > using 
> > > > cephadm. Yet here it is. This isn't an upgraded machine, it
> > > > was 
> > > > constructed within the last week from the ground up. So I have
> > > > no 
> > > > idea how the legacy definition got there. On two separate
> > > > systems.
> > > > 
> > > > The disable on the legacy OSD worked and the container is now 
> > > > running. Although I'm not sure that it will survive a reboot,
> > > > since 
> > > > the legacy service is dynamically created on each reboot.
> > > > 
> > > > This is what happens when I try to adopt:
> > > > 
> > > >  cephadm adopt --style legacy --name osd.4
> > > > Pulling container image quay.io/ceph/ceph:v16...
> > > > Found online OSD at //var/lib/ceph/osd/ceph-4/fsid
> > > > objectstore_type is bluestore
> > > > Disabling old systemd unit ceph-osd@4...
> > > > Moving data...
> > > > Traceback (most recent call last):
> > > >   File "/usr/sbin/cephadm", line 9509, in <module>
> > > >     main()
> > > >   File "/usr/sbin/cephadm", line 9497, in main
> > > >     r = ctx.func(ctx)
> > > >   File "/usr/sbin/cephadm", line 2061, in _default_image
> > > >     return func(ctx)
> > > >   File "/usr/sbin/cephadm", line 6043, in command_adopt
> > > >     command_adopt_ceph(ctx, daemon_type, daemon_id, fsid)
> > > >   File "/usr/sbin/cephadm", line 6210, in command_adopt_ceph
> > > >     move_files(ctx, glob(os.path.join(data_dir_src, '*')),
> > > >   File "/usr/sbin/cephadm", line 2215, in move_files
> > > >     os.symlink(src_rl, dst_file)
> > > > FileExistsError: [Errno 17] File exists:
> > > > '/dev/vg_ceph/ceph0504' -
> > > > >  
> > > > '/var/lib/ceph/278fcd86-0861-11ee-a7df-
> > > > 9c5c8e86cf8f/osd.4/block'
> > > > 
> > > > I have considered simply doing a brute-force removal of the
> > > > OSD 
> > > > files in /var/lib/ceph/osd but I'm not sure what ill effects
> > > > might 
> > > > ensue. I discovered that my other offending machine actually
> > > > has
> > > > TWO 
> > > > legacy OSD directories, but only one of them is being used.
> > > > The 
> > > > other OSD is the remnant of a deletion and it's just dead files
> > > > now.
> > > > 
> > > > 
> > > > 
> > > > On 7/13/24 02:39, Eugen Block wrote:
> > > > > Okay, it looks like you just need some further cleanup
> > > > > regarding 
> > > > > your phantom hosts, for example:
> > > > > 
> > > > > ceph osd crush remove www2
> > > > > ceph osd crush remove docker0
> > > > > 
> > > > > and so on.
> > > > > 
> > > > > Regarding the systemd unit (well, cephadm also generates one,
> > > > > but 
> > > > > with the fsid as already mentioned), you could just stop and 
> > > > > disable the old one:
> > > > > 
> > > > > systemctl disable --now ceph-osd@4
> > > > > 
> > > > > and see if the container takes over.
> > > > > 
> > > > > Was this your attempt to adopt an existing OSD from pre-
> > > > > cephadm?
> > > > > 
> > > > > > ceph orch daemon add osd
> > > > > > ceph05.internal.mousetech.com:vg_ceph/ceph0504
> > > > > 
> > > > > The recommended way would have been to adopt the device:
> > > > > 
> > > > > cephadm [--image your-custom-image] adopt --style legacy --
> > > > > name
> > > > > osd.4
> > > > > 
> > > > > locally on that host. The --image parameter is optional. Did
> > > > > you 
> > > > > follow the docs [1] when you moved to cephadm? Anyway, since
> > > > > it 
> > > > > somehow seems to work already, it's probably not that
> > > > > relevant 
> > > > > anymore, I just wanted to point to it anyway.
> > > > > 
> > > > > [1] https://docs.ceph.com/en/latest/cephadm/adoption/
> > > > > 
> > > > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> > > > > 
> > > > > > This particular system has it both ways and neither wants
> > > > > > to
> > > > > > work.
> > > > > > 
> > > > > > The peculiar thing was that when I first re-created the OSD
> > > > > > with
> > > > > > cephadm, it was reported that this was an "unmanaged node".
> > > > > > So
> > > > > > I ran
> > > > > > the same cephadm agin and THAT time it showed up. So I
> > > > > > suspect
> > > > > > that the
> > > > > > ceph-osd@4.service was the first install and the
> > > > > > ceph-~~~~~@osd.4.servuce got added on the second try.
> > > > > > 
> > > > > > ceph orch daemon add osd
> > > > > > ceph05.internal.mousetech.com:vg_ceph/ceph0504
> > > > > > 
> > > > > > Here's the OSD tree:
> > > > > > 
> > > > > > ID   CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT 
> > > > > > PRI-
> > > > > > AFF
> > > > > >  -1         7.00000  root default
> > > > > > -25         1.00000      host ceph01
> > > > > >   1    hdd  1.00000          osd.1         up   1.00000 
> > > > > > 1.00000
> > > > > > -28         1.00000      host ceph03
> > > > > >   3    hdd  1.00000          osd.3         up   1.00000 
> > > > > > 1.00000
> > > > > > -11         1.00000      host ceph04
> > > > > >   7    hdd  1.00000          osd.7         up   1.00000 
> > > > > > 1.00000
> > > > > > -31         1.00000      host ceph05
> > > > > >   4    hdd  1.00000          osd.4         up   1.00000 
> > > > > > 1.00000
> > > > > >  -9         1.00000      host ceph06
> > > > > >   2    hdd  1.00000          osd.2         up   1.00000 
> > > > > > 1.00000
> > > > > > -22         1.00000      host dell02
> > > > > >   5    ssd  1.00000          osd.5         up   1.00000 
> > > > > > 1.00000
> > > > > > -13               0      host docker0
> > > > > >  -5               0      host www2
> > > > > >  -3               0      host www6
> > > > > >  -7         1.00000      host www7
> > > > > >   0    hdd  1.00000          osd.0         up   1.00000 
> > > > > > 1.00000
> > > > > > 
> > > > > > Note that www2 was recommissioned and replaced by the
> > > > > > ceph05
> > > > > > machine
> > > > > > and no longer physically exists. www6 was the
> > > > > > original/admin
> > > > > > ceph node
> > > > > > with all the accumulated glop. It ran on the base OS (not
> > > > > > in a
> > > > > > VM) and
> > > > > > I have not attempted to re-create that. Ceph remembers it
> > > > > > from
> > > > > > its own
> > > > > > internal memory, as the original www6 system drive was
> > > > > > replaced
> > > > > > with an
> > > > > > SSD and the OS was installed from scratch. The docker0
> > > > > > machine
> > > > > > was
> > > > > > never a ceph node. It's the phantom that got yanked it
> > > > > > because
> > > > > > I
> > > > > > accidentally supplied its IP address when doing a ceph orch
> > > > > > host add
> > > > > > for a completely different machine.
> > > > > > 
> > > > > > And here's the host list as the orchestrator sees it:
> > > > > > 
> > > > > > ceph orch host ls
> > > > > > HOST                           ADDR       LABELS     
> > > > > > STATUS
> > > > > > ceph01.internal.mousetech.com  10.0.1.21
> > > > > > ceph03.internal.mousetech.com  10.0.1.53
> > > > > > ceph04.mousetech.com           10.0.1.14
> > > > > > ceph05.internal.mousetech.com  10.0.1.54
> > > > > > ceph06.internal.mousetech.com  10.0.1.56
> > > > > > dell02.mousetech.com           10.0.1.52  _admin rgw
> > > > > > www7.mousetech.com             10.0.1.7   rgw
> > > > > > 7 hosts in cluster
> > > > > > 
> > > > > > 
> > > > > > On Fri, 2024-07-12 at 22:15 +0000, Eugen Block wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > containerized daemons usually have the fsid in the
> > > > > > > systemd
> > > > > > > unit,
> > > > > > > like
> > > > > > > ceph-{fsid}@osd.5
> > > > > > > 
> > > > > > > Is it possible that you have those confused? Check the
> > > > > > > /var/lib/ceph/osd/ directory to find possible orphaned
> > > > > > > daemons and
> > > > > > > clean them up.
> > > > > > > And as previously stated, it would help to see your osd
> > > > > > > tree
> > > > > > > and
> > > > > > > which
> > > > > > > OSDs you’re talking about.
> > > > > > > 
> > > > > > > Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:
> > > > > > > 
> > > > > > > > Incidentally, I just noticed that my phantom host isn't
> > > > > > > > completely
> > > > > > > > gone. It's not in the host list, either command-line or
> > > > > > > > dashboard,
> > > > > > > > but
> > > > > > > > it does list (with no assets) as a host under "ceph osd
> > > > > > > > tree".
> > > > > > > > 
> > > > > > > > ---
> > > > > > > > 
> > > > > > > > More seriously, I've been having problems with OSDs
> > > > > > > > that
> > > > > > > > report as
> > > > > > > > being both up and down at the same time.
> > > > > > > > 
> > > > > > > > This is on 2 new hosts. One host saw this when I made
> > > > > > > > it
> > > > > > > > the _admin
> > > > > > > > host. The other caught it because it's running in a VM
> > > > > > > > with
> > > > > > > > the OSD
> > > > > > > > mapped out as an imported disk and the host os managed
> > > > > > > > to
> > > > > > > > flip
> > > > > > > > which
> > > > > > > > drive was sda and which was sbd, resulting in having to
> > > > > > > > delete and
> > > > > > > > re-
> > > > > > > > define the OSD in the VM.
> > > > > > > > 
> > > > > > > > But now the OSD on this VM reports as "UP/IN" on the
> > > > > > > > dashboard
> > > > > > > > while
> > > > > > > > it's "error" on "ceph orch ps" and on the actual vbox,
> > > > > > > > the
> > > > > > > > osd
> > > > > > > > container fails on startup. viz:
> > > > > > > > 
> > > > > > > > ul 12 20:06:48 ceph05.internal.mousetech.com ceph-
> > > > > > > > 278fcd86-
> > > > > > > > 0861-
> > > > > > > > 11ee-
> > > > > > > > a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-
> > > > > > > > 12T20:06:48.056+0000
> > > > > > > > 7fc17dfb9380 -1 bdev(0x55e4853c4800
> > > > > > > > /var/lib/ceph/osd/ceph-
> > > > > > > > 4/block)
> > > > > > > > open open got: (16) De>
> > > > > > > > Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-
> > > > > > > > 278fcd86-0861-
> > > > > > > > 11ee-
> > > > > > > > a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-
> > > > > > > > 12T20:06:48.056+0000
> > > > > > > > 7fc17dfb9380 -1 osd.4 0 OSD:init: unable to mount
> > > > > > > > object
> > > > > > > > store
> > > > > > > > Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-
> > > > > > > > 278fcd86-0861-
> > > > > > > > 11ee-
> > > > > > > > a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-
> > > > > > > > 12T20:06:48.056+0000
> > > > > > > > 7fc17dfb9380 -1  ** ERROR: osd init failed: (16) Device
> > > > > > > > or
> > > > > > > > resource
> > > > > > > > busy
> > > > > > > > 
> > > > > > > > Note that truncated message above reads
> > > > > > > > 
> > > > > > > > bdev(0x55e4853c4800 /var/lib/ceph/osd/ceph-4/block)
> > > > > > > > open
> > > > > > > > open got:
> > > > > > > > (16)
> > > > > > > > Device or resource busy
> > > > > > > > 
> > > > > > > > Rebooting doesn't help. Nor does freeing us resources
> > > > > > > > and
> > > > > > > > stopping/starting processes manually. The problem
> > > > > > > > eventually
> > > > > > > > cleared up
> > > > > > > > spontaneously on the admin box, but I have no idea why.
> > > > > > > > 
> > > > > > > > ---
> > > > > > > > 
> > > > > > > > Also noted that now the OSD on the admin box shows in
> > > > > > > > ceph
> > > > > > > > orch ps
> > > > > > > > as
> > > > > > > > "stopped", though again, the dashboard lists it as
> > > > > > > > "UP/IN".
> > > > > > > > 
> > > > > > > > Here's what systemctl thinks about it:
> > > > > > > > 
> > > > > > > > systemctl status ceph-osd@5.service
> > > > > > > > ● ceph-osd@5.service - Ceph object storage daemon osd.5
> > > > > > > >      Loaded: loaded (/usr/lib/systemd/system/ceph-
> > > > > > > > osd@.service;
> > > > > > > > enabled-runtime; preset: disabled)
> > > > > > > >      Active: active (running) since Fri 2024-07-12
> > > > > > > > 16:45:51
> > > > > > > > EDT;
> > > > > > > > 1min
> > > > > > > > 40s ago
> > > > > > > >     Process: 8511 ExecStartPre=/usr/libexec/ceph/ceph-
> > > > > > > > osd-
> > > > > > > > prestart.sh -
> > > > > > > > -cluster ${CLUSTER} --id 5 (code=exited,
> > > > > > > > status=0/SUCCESS)
> > > > > > > >    Main PID: 8517 (ceph-osd)
> > > > > > > >       Tasks: 70
> > > > > > > >      Memory: 478.6M
> > > > > > > >         CPU: 3.405s
> > > > > > > >      CGroup: /system.slice/system-
> > > > > > > > ceph\x2dosd.slice/ceph-osd@5.service
> > > > > > > >              └─8517 /usr/bin/ceph-osd -f --cluster ceph
> > > > > > > > --
> > > > > > > > id 5 --
> > > > > > > > setuser ceph --setgroup ceph
> > > > > > > > 
> > > > > > > > Jul 12 16:45:51 dell02.mousetech.com systemd[1]:
> > > > > > > > Starting
> > > > > > > > Ceph
> > > > > > > > object
> > > > > > > > storage daemon osd.5...
> > > > > > > > Jul 12 16:45:51 dell02.mousetech.com systemd[1]:
> > > > > > > > Started
> > > > > > > > Ceph
> > > > > > > > object
> > > > > > > > storage daemon osd.5.
> > > > > > > > Jul 12 16:45:51 dell02.mousetech.com ceph-osd[8517]:
> > > > > > > > 2024-
> > > > > > > > 07-
> > > > > > > > 12T16:45:51.642-0400 7f2bd440c140 -1 Falling back to
> > > > > > > > public
> > > > > > > > interface
> > > > > > > > Jul 12 16:45:58 dell02.mousetech.com ceph-osd[8517]:
> > > > > > > > 2024-
> > > > > > > > 07-
> > > > > > > > 12T16:45:58.352-0400 7f2bd440c140 -1 osd.5 34161
> > > > > > > > log_to_monitors
> > > > > > > > {default=true}
> > > > > > > > Jul 12 16:45:59 dell02.mousetech.com ceph-osd[8517]:
> > > > > > > > 2024-
> > > > > > > > 07-
> > > > > > > > 12T16:45:59.206-0400 7f2bcbbf0640 -1 osd.5 34161
> > > > > > > > set_numa_affinity
> > > > > > > > unable to identify public interface '' numa node: (2)
> > > > > > > > No
> > > > > > > > such file
> > > > > > > > or
> > > > > > > > directory
> > > > > > > > 
> > > > > > > > The actual container is not running.
> > > > > > > > 
> > > > > > > > Ceph version, incidentally, is 16.2.15. Except for that
> > > > > > > > one
> > > > > > > > node
> > > > > > > > that
> > > > > > > > apparently didn't move up from Octupus (I'll be nuking
> > > > > > > > that
> > > > > > > > one
> > > > > > > > shortly).
> > > > > > > > _______________________________________________
> > > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > > > To unsubscribe send an email to
> > > > > > > > ceph-users-leave@xxxxxxx
> > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > 
> > > 
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx