Re: Schödinger's OSD

Eugen Block <eblock@xxxxxx> · Mon, 15 Jul 2024 05:49:40 +0000

If the OSD is already running in a container, adopting it won't work,  
as you already noticed. I don't have an explanation how the  
non-cephadm systemd unit has been created, but that should be fixed by  
disabling it.

I have considered simply doing a brute-force removal of the OSD  
files in /var/lib/ceph/osd but I'm not sure what ill effects might  
ensue. I discovered that my other offending machine actually has TWO  
legacy OSD directories, but only one of them is being used. The  
other OSD is the remnant of a deletion and it's just dead files now.

Check which OSDs are active and remove the remainders of the orphaned  
directories, that should be fine. But be careful and check properly  
before actually remocing anything and only remove one by one while  
watching the cluster status.

Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:

OK. Phantom hosts are gone. Many thanks! I'll have to review my  
checklist for decomissioning hosts to make sure that step is on it.

On the legacy/container OSD stuff, that is a complete puzzle.

While the first thing that I see when I look up "creating an OSD" in  
the system documentation is the manual process, I've been using  
cephadm long enough to know to dig past that. The manual process is  
sufficiently tedious that I cannot think that I'd have used it by  
accident. Especially since I set out with the explicit goal of using  
cephadm. Yet here it is. This isn't an upgraded machine, it was  
constructed within the last week from the ground up. So I have no  
idea how the legacy definition got there. On two separate systems.

The disable on the legacy OSD worked and the container is now  
running. Although I'm not sure that it will survive a reboot, since  
the legacy service is dynamically created on each reboot.

This is what happens when I try to adopt:

 cephadm adopt --style legacy --name osd.4
Pulling container image quay.io/ceph/ceph:v16...
Found online OSD at //var/lib/ceph/osd/ceph-4/fsid
objectstore_type is bluestore
Disabling old systemd unit ceph-osd@4...
Moving data...
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 9509, in <module>
    main()
  File "/usr/sbin/cephadm", line 9497, in main
    r = ctx.func(ctx)
  File "/usr/sbin/cephadm", line 2061, in _default_image
    return func(ctx)
  File "/usr/sbin/cephadm", line 6043, in command_adopt
    command_adopt_ceph(ctx, daemon_type, daemon_id, fsid)
  File "/usr/sbin/cephadm", line 6210, in command_adopt_ceph
    move_files(ctx, glob(os.path.join(data_dir_src, '*')),
  File "/usr/sbin/cephadm", line 2215, in move_files
    os.symlink(src_rl, dst_file)
FileExistsError: [Errno 17] File exists: '/dev/vg_ceph/ceph0504' ->  
'/var/lib/ceph/278fcd86-0861-11ee-a7df-9c5c8e86cf8f/osd.4/block'

I have considered simply doing a brute-force removal of the OSD  
files in /var/lib/ceph/osd but I'm not sure what ill effects might  
ensue. I discovered that my other offending machine actually has TWO  
legacy OSD directories, but only one of them is being used. The  
other OSD is the remnant of a deletion and it's just dead files now.

On 7/13/24 02:39, Eugen Block wrote:
Okay, it looks like you just need some further cleanup regarding  
your phantom hosts, for example:

ceph osd crush remove www2
ceph osd crush remove docker0

and so on.

Regarding the systemd unit (well, cephadm also generates one, but  
with the fsid as already mentioned), you could just stop and  
disable the old one:

systemctl disable --now ceph-osd@4

and see if the container takes over.

Was this your attempt to adopt an existing OSD from pre-cephadm?

ceph orch daemon add osd ceph05.internal.mousetech.com:vg_ceph/ceph0504

The recommended way would have been to adopt the device:

cephadm [--image your-custom-image] adopt --style legacy --name osd.4

locally on that host. The --image parameter is optional. Did you  
follow the docs [1] when you moved to cephadm? Anyway, since it  
somehow seems to work already, it's probably not that relevant  
anymore, I just wanted to point to it anyway.

[1] https://docs.ceph.com/en/latest/cephadm/adoption/

Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:

This particular system has it both ways and neither wants to work.

The peculiar thing was that when I first re-created the OSD with
cephadm, it was reported that this was an "unmanaged node". So I ran
the same cephadm agin and THAT time it showed up. So I suspect that the
ceph-osd@4.service was the first install and the
ceph-~~~~~@osd.4.servuce got added on the second try.

ceph orch daemon add osd ceph05.internal.mousetech.com:vg_ceph/ceph0504

Here's the OSD tree:

ID   CLASS  WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
 -1         7.00000  root default
-25         1.00000      host ceph01
  1    hdd  1.00000          osd.1         up   1.00000  1.00000
-28         1.00000      host ceph03
  3    hdd  1.00000          osd.3         up   1.00000  1.00000
-11         1.00000      host ceph04
  7    hdd  1.00000          osd.7         up   1.00000  1.00000
-31         1.00000      host ceph05
  4    hdd  1.00000          osd.4         up   1.00000  1.00000
 -9         1.00000      host ceph06
  2    hdd  1.00000          osd.2         up   1.00000  1.00000
-22         1.00000      host dell02
  5    ssd  1.00000          osd.5         up   1.00000  1.00000
-13               0      host docker0
 -5               0      host www2
 -3               0      host www6
 -7         1.00000      host www7
  0    hdd  1.00000          osd.0         up   1.00000  1.00000

Note that www2 was recommissioned and replaced by the ceph05 machine
and no longer physically exists. www6 was the original/admin ceph node
with all the accumulated glop. It ran on the base OS (not in a VM) and
I have not attempted to re-create that. Ceph remembers it from its own
internal memory, as the original www6 system drive was replaced with an
SSD and the OS was installed from scratch. The docker0 machine was
never a ceph node. It's the phantom that got yanked it because I
accidentally supplied its IP address when doing a ceph orch host add
for a completely different machine.

And here's the host list as the orchestrator sees it:

ceph orch host ls
HOST                           ADDR       LABELS      STATUS
ceph01.internal.mousetech.com  10.0.1.21
ceph03.internal.mousetech.com  10.0.1.53
ceph04.mousetech.com           10.0.1.14
ceph05.internal.mousetech.com  10.0.1.54
ceph06.internal.mousetech.com  10.0.1.56
dell02.mousetech.com           10.0.1.52  _admin rgw
www7.mousetech.com             10.0.1.7   rgw
7 hosts in cluster

On Fri, 2024-07-12 at 22:15 +0000, Eugen Block wrote:
Hi,

containerized daemons usually have the fsid in the systemd unit,
like
ceph-{fsid}@osd.5

Is it possible that you have those confused? Check the
/var/lib/ceph/osd/ directory to find possible orphaned daemons and
clean them up.
And as previously stated, it would help to see your osd tree and
which
OSDs you’re talking about.

Zitat von Tim Holloway <timh@xxxxxxxxxxxxx>:

Incidentally, I just noticed that my phantom host isn't completely
gone. It's not in the host list, either command-line or dashboard,
but
it does list (with no assets) as a host under "ceph osd tree".

---

More seriously, I've been having problems with OSDs that report as
being both up and down at the same time.

This is on 2 new hosts. One host saw this when I made it the _admin
host. The other caught it because it's running in a VM with the OSD
mapped out as an imported disk and the host os managed to flip
which
drive was sda and which was sbd, resulting in having to delete and
re-
define the OSD in the VM.

But now the OSD on this VM reports as "UP/IN" on the dashboard
while
it's "error" on "ceph orch ps" and on the actual vbox, the osd
container fails on startup. viz:

ul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-0861-
11ee-
a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-12T20:06:48.056+0000
7fc17dfb9380 -1 bdev(0x55e4853c4800 /var/lib/ceph/osd/ceph-4/block)
open open got: (16) De>
Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-0861-
11ee-
a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-12T20:06:48.056+0000
7fc17dfb9380 -1 osd.4 0 OSD:init: unable to mount object store
Jul 12 20:06:48 ceph05.internal.mousetech.com ceph-278fcd86-0861-
11ee-
a7df-9c5c8e86cf8f-osd-4[4017]: debug 2024-07-12T20:06:48.056+0000
7fc17dfb9380 -1  ** ERROR: osd init failed: (16) Device or resource
busy

Note that truncated message above reads

bdev(0x55e4853c4800 /var/lib/ceph/osd/ceph-4/block) open open got:
(16)
Device or resource busy

Rebooting doesn't help. Nor does freeing us resources and
stopping/starting processes manually. The problem eventually
cleared up
spontaneously on the admin box, but I have no idea why.

---

Also noted that now the OSD on the admin box shows in ceph orch ps
as
"stopped", though again, the dashboard lists it as "UP/IN".

Here's what systemctl thinks about it:

systemctl status ceph-osd@5.service
● ceph-osd@5.service - Ceph object storage daemon osd.5
     Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service;
enabled-runtime; preset: disabled)
     Active: active (running) since Fri 2024-07-12 16:45:51 EDT;
1min
40s ago
    Process: 8511 ExecStartPre=/usr/libexec/ceph/ceph-osd-
prestart.sh -
-cluster ${CLUSTER} --id 5 (code=exited, status=0/SUCCESS)
   Main PID: 8517 (ceph-osd)
      Tasks: 70
     Memory: 478.6M
        CPU: 3.405s
     CGroup: /system.slice/system-
ceph\x2dosd.slice/ceph-osd@5.service
             └─8517 /usr/bin/ceph-osd -f --cluster ceph --id 5 --
setuser ceph --setgroup ceph

Jul 12 16:45:51 dell02.mousetech.com systemd[1]: Starting Ceph
object
storage daemon osd.5...
Jul 12 16:45:51 dell02.mousetech.com systemd[1]: Started Ceph
object
storage daemon osd.5.
Jul 12 16:45:51 dell02.mousetech.com ceph-osd[8517]: 2024-07-
12T16:45:51.642-0400 7f2bd440c140 -1 Falling back to public
interface
Jul 12 16:45:58 dell02.mousetech.com ceph-osd[8517]: 2024-07-
12T16:45:58.352-0400 7f2bd440c140 -1 osd.5 34161 log_to_monitors
{default=true}
Jul 12 16:45:59 dell02.mousetech.com ceph-osd[8517]: 2024-07-
12T16:45:59.206-0400 7f2bcbbf0640 -1 osd.5 34161 set_numa_affinity
unable to identify public interface '' numa node: (2) No such file
or
directory

The actual container is not running.

Ceph version, incidentally, is 16.2.15. Except for that one node
that
apparently didn't move up from Octupus (I'll be nuking that one
shortly).
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx