Thanks Eugen and others for the advice. These are not, however, lvm-based
OSDs. I can get a list of what is out there with:
cephadm ceph-volume raw list
and tried
cephadm ceph-volume raw activate
but it tells me I need to manually run activate.
I was able to find the correct data disks with for example:
ceph-bluestore-tool show-label --dev /dev/sda2
but on running e.g.
cephadm ceph-volume raw activate --osd-id 20 --device /dev/sda --osd-uuid
74f4ce9c-4623-41b7-a7f9-cc81bb9467ef --block.db /dev/nvme1n1p1 --block.wal
/dev/nvme0n1p1
(OSD ID inferred from the list of down OSDs)
I got an error that "systemd support not yet implemented". On adding
--no-systemd to the command, I get the response:
stderr KeyError: 'osd_id'
"
The on-disk metadata indeed doesn't have an osd_id for most entries. For
the one instance I can find with the osd_id key in the metadata, the
"cephadm ceph-volume raw activate" completes but with no apparent change to
the system.
Is there any advice on how to recover the configuration with raw, not LVM,
OSDs?
And then once I have things added back in: the host is currently listed as
offline in the output of "ceph orch host ls". How can it be re-added to
this list?
Thank you,
Peter
BTW full error message:
Inferring fsid ed7b2c16-b053-45e2-a1fe-bf3474f90508
Using ceph image with id '59248721b0c7' and tag 'v17' created on 2024-04-24
16:06:51 +0000 UTC
quay.io/ceph/ceph@sha256:96f2a53bc3028eec16e790c6225e7d7acad8a48737a57ec14eea7ce036733233
Non-zero exit code 1 from /usr/bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --ulimit nofile=1048576 --net=host --entrypoint
/usr/sbin/ceph-volume --privileged --group-add=disk --init -e
CONTAINER_IMAGE=
quay.io/ceph/ceph@sha256:96f2a53bc3028eec16e790c6225e7d7acad8a48737a57ec14eea7ce036733233
-e NODE_NAME=ceph-osd3 -e CEPH_USE_RANDOM_NONCE=1 -e
CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
/var/log/ceph/ed7b2c16-b053-45e2-a1fe-bf3474f90508:/var/log/ceph:z -v
/dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
/tmp/ceph-tmpjox0_hj0:/etc/ceph/ceph.conf:z
quay.io/ceph/ceph@sha256:96f2a53bc3028eec16e790c6225e7d7acad8a48737a57ec14eea7ce036733233
raw activate --osd-id 20 --device /dev/sda --osd-uuid
74f4ce9c-4623-41b7-a7f9-cc81bb9467ef --block.db /dev/nvme1n1p1 --block.wal
/dev/nvme0n1p1 --no-systemd
/usr/bin/docker: stderr Traceback (most recent call last):
/usr/bin/docker: stderr File "/usr/sbin/ceph-volume", line 11, in <module>
/usr/bin/docker: stderr load_entry_point('ceph-volume==1.0.0',
'console_scripts', 'ceph-volume')()
/usr/bin/docker: stderr File
"/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__
/usr/bin/docker: stderr self.main(self.argv)
/usr/bin/docker: stderr File
"/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in
newfunc
/usr/bin/docker: stderr return f(*a, **kw)
/usr/bin/docker: stderr File
"/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
/usr/bin/docker: stderr terminal.dispatch(self.mapper, subcommand_args)
/usr/bin/docker: stderr File
"/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in
dispatch
/usr/bin/docker: stderr instance.main()
/usr/bin/docker: stderr File
"/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/main.py", line
32, in main
/usr/bin/docker: stderr terminal.dispatch(self.mapper, self.argv)
/usr/bin/docker: stderr File
"/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in
dispatch
/usr/bin/docker: stderr instance.main()
/usr/bin/docker: stderr File
"/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/activate.py",
line 166, in main
/usr/bin/docker: stderr systemd=not self.args.no_systemd)
/usr/bin/docker: stderr File
"/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in
is_root
/usr/bin/docker: stderr return func(*a, **kw)
/usr/bin/docker: stderr File
"/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/activate.py",
line 79, in activate
/usr/bin/docker: stderr osd_id = meta['osd_id']
/usr/bin/docker: stderr KeyError: 'osd_id'
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 9679, in <module>
main()
File "/usr/sbin/cephadm", line 9667, in main
r = ctx.func(ctx)
File "/usr/sbin/cephadm", line 2116, in _infer_config
return func(ctx)
File "/usr/sbin/cephadm", line 2061, in _infer_fsid
return func(ctx)
File "/usr/sbin/cephadm", line 2144, in _infer_image
return func(ctx)
File "/usr/sbin/cephadm", line 2019, in _validate_fsid
return func(ctx)
File "/usr/sbin/cephadm", line 6272, in command_ceph_volume
out, err, code = call_throws(ctx, c.run_cmd(),
verbosity=CallVerbosity.QUIET_UNLESS_ERROR)
File "/usr/sbin/cephadm", line 1807, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --ulimit nofile=1048576 --net=host --entrypoint
/usr/sbin/ceph-volume --privileged --group-add=disk --init -e
CONTAINER_IMAGE=
quay.io/ceph/ceph@sha256:96f2a53bc3028eec16e790c6225e7d7acad8a48737a57ec14eea7ce036733233
-e NODE_NAME=ceph-osd3 -e CEPH_USE_RANDOM_NONCE=1 -e
CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
/var/log/ceph/ed7b2c16-b053-45e2-a1fe-bf3474f90508:/var/log/ceph:z -v
/dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
/tmp/ceph-tmpjox0_hj0:/etc/ceph/ceph.conf:z
quay.io/ceph/ceph@sha256:96f2a53bc3028eec16e790c6225e7d7acad8a48737a57ec14eea7ce036733233
raw activate --osd-id 20 --device /dev/sda --osd-uuid
74f4ce9c-4623-41b7-a7f9-cc81bb9467ef --block.db /dev/nvme1n1p1 --block.wal
/dev/nvme0n1p1 --no-systemd
On Wed, 24 Apr 2024 at 14:47, Eugen Block <eblock@xxxxxx> wrote:
In addition to Nico's response, three years ago I wrote a blog post
[1] about that topic, maybe that can help as well. It might be a bit
outdated, what it definitely doesn't contain is this command from the
docs [2] once the server has been re-added to the host list:
ceph cephadm osd activate <host>
Regards,
Eugen
[1]
https://heiterbiswolkig.blogs.nde.ag/2021/02/08/cephadm-reusing-osds-on-reinstalled-server/
[2]
https://docs.ceph.com/en/latest/cephadm/services/osd/#activate-existing-osds
Zitat von Nico Schottelius <nico.schottelius@xxxxxxxxxxx>:
Hey Peter,
the /var/lib/ceph directories mainly contain "meta data" that, depending
on the ceph version and osd setup, can even be residing on tmpfs by
default.
Even if the data was on-disk, they are easy to recreate:
--------------------------------------------------------------------------------
[root@rook-ceph-osd-36-6876cdb479-4764r ceph-36]# ls -l
total 28
lrwxrwxrwx 1 ceph ceph 8 Feb 7 12:12 block -> /dev/sde
-rw------- 1 ceph ceph 37 Feb 7 12:12 ceph_fsid
-rw------- 1 ceph ceph 37 Feb 7 12:12 fsid
-rw------- 1 ceph ceph 56 Feb 7 12:12 keyring
-rw------- 1 ceph ceph 6 Feb 7 12:12 ready
-rw------- 1 ceph ceph 3 Feb 7 12:12 require_osd_release
-rw------- 1 ceph ceph 10 Feb 7 12:12 type
-rw------- 1 ceph ceph 3 Feb 7 12:12 whoami
[root@rook-ceph-osd-36-6876cdb479-4764r ceph-36]#
--------------------------------------------------------------------------------
We used to create OSDs manually on alpine linux some years ago using
[0], you can check it out as an inspiration for what should be in which
file.
BR,
Nico
[0]
https://code.ungleich.ch/ungleich-public/ungleich-tools/src/branch/master/ceph/ceph-osd-create-start-alpine
Peter van Heusden <pvh@xxxxxxxxxxx> writes:
Dear Ceph Community
We have 5 OSD servers running Ceph v15.2.17. The host operating system
is
Ubuntu 20.04.
One of the servers has suffered corruption to its boot operating system.
Using a system rescue disk it is possible to mount the root filesystem
but
it is not possible to boot the operating system at the moment.
The OSDs are configured with (spinning disk) data drives, WALs and DBs
on
partitions of SSDs, but from my examination of the filesystem the
configuration in /var/lib/ceph appears to be corrupted.
So my question is: what is the best option for repair going forward? Is
it
possible to do a clean install of the operating system and scan the
existing drives in order to reconstruct the OSD configuration?
Thank you,
Peter
P.S. the cause of the original corruption is likely due to an unplanned
power outage, an event that hopefully will not recur.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx