bbk, It did help! Thank you.
Here's a slightly more 'with the osd-fsid details filled in' procedure
for moving a 'dockerized' / container-run OSD set of drives to a
replacement server/motherboard (or the same server with blank/new/fresh
reinstalled OS). For occasions when the 'new setup' will have the same
hostname as the retired/replaced one. Also for when you'd rather not
just wait for redundancy procedures to use other copies to refill fresh
or freshly wiped drives.
1. Get the new or new-os server entirely current, up and running
including validating the host is 'ceph ready' with the same hostname as
the old.
cephadm prepare-host
Make sure the ceph public key is in /root/.ssh/authorized_keys
ceph cephadm get-pub-key > ~/ceph.pub
ssh-copy-id -f -i ~//ceph.pub /root@/TargetHost/
Be sure you can 'ssh in' from a few other ceph cluster hosts.
If you previously had mons, mds, mgrs & etc in your ceph config to run
on that host, you should notice after a couple minutes ceph has got them
back into the cluster. Not that this is a good idea, to have a bunch of
such running on the same host as osd's, but just in case. To gain
confidence this will 'work', don't do the further steps until everything
checks out and the only thing left to do is restore the OSD's. (ps axu,
see the related mon, mgr, mds or other containers if any running).
2. Install the OSD drives. Reboot (there will be lvm pv/vgs on the OSD
drives, but no ceph containers attached to them).
3. do ceph config generate-minimal-conf
then from it, use those details to make a template file that looks like
this:
osd.X.json:
{
"config": "# minimal ceph.conf for
4067126d-or-whatever\n[global]\n\tfsid =
4067126d-or-whatever\n\tmon_host = [v2:[fc00:..etcetcetc]\n",
"keyring": "[osd.X]\n\tkey = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\n"
}
Note the parsers for the above are really, really picky about spaces, so
get it exactly right.
4. cephadm ceph-volume list
you should see the OSD list there of what's plugged into the system.
What you want is to copy the osd-fsid (later on).
5. for each osd without a running container, do: ceph auth get osd.[ID]
6. cp osd.X.json osd.[ID].json
7. edit osd.[ID].json, change the key to the result of step 5 and the X
in [osd.X] to the osd number.
7. copy the osd-fsid for the correct volume from step 4.
8. fixup this command to match your situation:
cephadm deploy --name osd.X --fsid like-4067126d-whatever --osd-fsid
FOR-THAT-SPECIFIC_OSD_X_from-step-4 --config-json osd.X.json
Changing the fsid, osd.X and osd-fsid and osd.X.json to match your
situation.
That will create a container with the OSD code in it, and restore it to
the cluster.
HTH
Harry
//
On 12/10/21 04:05, bbk wrote:
Hi,
i like to answer to myself :-) I finally found the rest of my documentation... So after reinstalling the OS also the osd config must be created.
Here is what i have done, maybe this helps someone:
------------------
Get the informations:
```
cephadm ceph-volume lvm list
ceph config generate-minimal-conf
ceph auth get osd.[ID]
```
Now create a minimal osd config:
```
vi osd.[ID].json
```
```
{
"config": "# minimal ceph.conf for 6d0ecf22-9155-4684-971a-2f6cde8628c8\n[global]\n\tfsid = 6d0ecf22-9155-4684-971a-2f6cde8628c8\n\tmon_host = [v2:192.168.6.21:3300/0,v1:192.168.6.21:6789/0] [v2:192.168.6.22:3300/0,v1:192.168.6.22:6789/0] [v2:192.168.6.23:3300/0,v1:192.168.6.23:6789/0] [v2:192.168.6.24:3300/0,v1:192.168.6.24:6789/0] [v2:192.168.6.25:3300/0,v1:192.168.6.25:6789/0]\n",
"keyring": "[osd.XXX]\n\tkey = XXXXXXXXXXXXXXXXXXXX\n"
}
```
Deploy the OSD daemon:
```
cephadm deploy --fsid 6d0ecf22-9155-4684-971a-2f6cde8628c8 --osd-fsid [ID] --name osd.[ID] --config-json osd.[ID].json
```
Yours,
bbk
On Thu, 2021-12-09 at 18:35 +0100, bbk wrote:
After reading my mail it may not be clear that i reinstalled the OS of
a node with OSDs.
On Thu, 2021-12-09 at 18:10 +0100, bbk wrote:
Hi,
the last time i have reinstalled a node with OSDs, i added the disks
with the following command. But unfortunatly this time i ran into a
error.
It seems like this time the command doesn't create the container, i
am able to run `cephadm shell`, and other daemons (mon,mgr,mds) are
running.
I don't know if that is the right way to do it?
~# cephadm deploy --fsid 6d0ecf22-9155-4684-971a-2f6cde8628c8 --osd-
fsid 941c6cb6-6898-4aa2-a33a-cec3b6a95cf1 --name osd.9
Non-zero exit code 125 from /usr/bin/podman container inspect --
format {{.State.Status}} ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8-
osd-9
/usr/bin/podman: stderr Error: error inspecting object: no such
container ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8-osd-9
Non-zero exit code 125 from /usr/bin/podman container inspect --
format {{.State.Status}} ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8-
osd.9
/usr/bin/podman: stderr Error: error inspecting object: no such
container ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8-osd.9
Deploy daemon osd.9 ...
Non-zero exit code 1 from systemctl start
ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8@osd.9
systemctl: stderr Job for
ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8@osd.9.service failed
because the control process exited with error code.
systemctl: stderr See "systemctl status
ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8@osd.9.service" and
"journalctl -xe" for details.
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 8571, in <module>
main()
File "/usr/sbin/cephadm", line 8559, in main
r = ctx.func(ctx)
File "/usr/sbin/cephadm", line 1787, in _default_image
return func(ctx)
File "/usr/sbin/cephadm", line 4549, in command_deploy
ports=daemon_ports)
File "/usr/sbin/cephadm", line 2677, in deploy_daemon
c, osd_fsid=osd_fsid, ports=ports)
File "/usr/sbin/cephadm", line 2906, in deploy_daemon_units
call_throws(ctx, ['systemctl', 'start', unit_name])
File "/usr/sbin/cephadm", line 1467, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: systemctl start
ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8@osd.9
~# cephadm ceph-volume lvm list
====== osd.9 =======
[block] /dev/ceph-07fa2bb7-628f-40c0-8725-0266926371c0/osd-
block-941c6cb6-6898-4aa2-a33a-cec3b6a95cf1
block device /dev/ceph-07fa2bb7-628f-40c0-8725-
0266926371c0/osd-block-941c6cb6-6898-4aa2-a33a-cec3b6a95cf1
block uuid mVEhfF-LK4E-Dtmb-Jj23-tn8x-lpLy-
KiUy1a
cephx lockbox secret
cluster fsid 6d0ecf22-9155-4684-971a-2f6cde8628c8
cluster name ceph
crush device class None
encrypted 0
osd fsid 941c6cb6-6898-4aa2-a33a-cec3b6a95cf1
osd id 9
type block
vdo 0
devices /dev/sdd
~# podman --version
podman version 3.2.3
~# cephadm version
Using recent ceph image
quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a
79324739404cc1765728
ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
pacific (stable)
~# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description: Red Hat Enterprise Linux release 8.5 (Ootpa)
Release: 8.5
Codename: Ootpa
~# cephadm shell
Inferring fsid 6d0ecf22-9155-4684-971a-2f6cde8628c8
Using recent ceph image
quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a
79324739404cc1765728
[ceph: root@hobro /]#
Yours,
bbk
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx