Re: reinstalled node with OSD

"Harry G. Coin" <hgcoin@xxxxxxxxx> · Wed, 11 May 2022 16:11:43 -0500

bbk, It did help!  Thank you.

Here's a slightly more 'with the osd-fsid details filled in' procedure 
for moving a 'dockerized' / container-run OSD set of drives to a 
replacement server/motherboard (or the same server with blank/new/fresh 
reinstalled OS).  For occasions when the 'new setup' will have the same 
hostname as the retired/replaced one. Also for when you'd rather not 
just wait for redundancy procedures to use other copies to refill fresh 
or freshly wiped drives.

1. Get the new or new-os server entirely current, up and running 
including validating the host is 'ceph ready' with the same hostname as 
the old.

cephadm prepare-host

Make sure the ceph public key is in /root/.ssh/authorized_keys

ceph cephadm get-pub-key > ~/ceph.pub
ssh-copy-id -f -i ~//ceph.pub /root@/TargetHost/

Be sure you can 'ssh in' from a few other ceph cluster hosts.

If you previously had mons, mds, mgrs & etc in your ceph config to run 
on that host, you should notice after a couple minutes ceph has got them 
back into the cluster.  Not that this is a good idea, to have a bunch of 
such running on the same host as osd's, but just in case.  To gain 
confidence this will 'work', don't do the further steps until everything 
checks out and the only thing left to do is restore the OSD's.  (ps axu, 
see the related mon, mgr, mds or other containers if any running).

2.  Install the OSD drives.  Reboot (there will be lvm pv/vgs on the OSD 
drives, but no ceph containers attached to them).

3. do ceph config generate-minimal-conf
then from it, use those details to make a template file that looks like 
this:
osd.X.json:
{
"config": "# minimal ceph.conf for 
4067126d-or-whatever\n[global]\n\tfsid = 
4067126d-or-whatever\n\tmon_host = [v2:[fc00:..etcetcetc]\n",
"keyring": "[osd.X]\n\tkey = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\n"
}

Note the parsers for the above are really, really picky about spaces, so 
get it exactly right.

4. cephadm ceph-volume list
you should see the OSD list there of what's plugged into the system.  
What you want is to copy the osd-fsid (later on).

5. for each osd without a running container, do:  ceph auth get osd.[ID]
6. cp osd.X.json osd.[ID].json
7. edit osd.[ID].json, change the key to the result of step 5 and the X 
in [osd.X] to the osd number.
7. copy the osd-fsid for the correct volume from step 4.
8. fixup this command to match your situation:
cephadm deploy --name osd.X --fsid like-4067126d-whatever --osd-fsid 
FOR-THAT-SPECIFIC_OSD_X_from-step-4 --config-json osd.X.json
Changing the fsid, osd.X and osd-fsid and osd.X.json to match your 
situation.
That will create a container with the OSD code in it, and restore it to 
the cluster.

HTH

Harry

//

On 12/10/21 04:05, bbk wrote:
Hi,

i like to answer to myself :-) I finally found the rest of my documentation... So after reinstalling the OS also the osd config must be created.

Here is what i have done, maybe this helps someone:

------------------

Get the informations:

```
cephadm ceph-volume lvm list
ceph config generate-minimal-conf
ceph auth get osd.[ID]
```

Now create a minimal osd config:

```
vi osd.[ID].json
```

```
{
"config": "# minimal ceph.conf for 6d0ecf22-9155-4684-971a-2f6cde8628c8\n[global]\n\tfsid = 6d0ecf22-9155-4684-971a-2f6cde8628c8\n\tmon_host = [v2:192.168.6.21:3300/0,v1:192.168.6.21:6789/0] [v2:192.168.6.22:3300/0,v1:192.168.6.22:6789/0] [v2:192.168.6.23:3300/0,v1:192.168.6.23:6789/0] [v2:192.168.6.24:3300/0,v1:192.168.6.24:6789/0] [v2:192.168.6.25:3300/0,v1:192.168.6.25:6789/0]\n",
"keyring": "[osd.XXX]\n\tkey = XXXXXXXXXXXXXXXXXXXX\n"
}
```

Deploy the OSD daemon:

```
cephadm deploy --fsid 6d0ecf22-9155-4684-971a-2f6cde8628c8 --osd-fsid [ID] --name osd.[ID] --config-json osd.[ID].json
```

Yours,
bbk

On Thu, 2021-12-09 at 18:35 +0100, bbk wrote:
After reading my mail it may not be clear that i reinstalled the OS of
a node with OSDs.

On Thu, 2021-12-09 at 18:10 +0100, bbk wrote:
Hi,

the last time i have reinstalled a node with OSDs, i added the disks
with the following command. But unfortunatly this time i ran into a
error.

It seems like this time the command doesn't create the container, i
am able to run `cephadm shell`, and other daemons (mon,mgr,mds) are
running.

I don't know if that is the right way to do it?

~# cephadm deploy --fsid 6d0ecf22-9155-4684-971a-2f6cde8628c8 --osd-
fsid 941c6cb6-6898-4aa2-a33a-cec3b6a95cf1 --name osd.9

Non-zero exit code 125 from /usr/bin/podman container inspect --
format {{.State.Status}} ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8-
osd-9
/usr/bin/podman: stderr Error: error inspecting object: no such
container ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8-osd-9
Non-zero exit code 125 from /usr/bin/podman container inspect --
format {{.State.Status}} ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8-
osd.9
/usr/bin/podman: stderr Error: error inspecting object: no such
container ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8-osd.9
Deploy daemon osd.9 ...
Non-zero exit code 1 from systemctl start
ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8@osd.9
systemctl: stderr Job for
ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8@osd.9.service  failed
because the control process exited with error code.
systemctl: stderr See "systemctl status
ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8@osd.9.service" and
"journalctl -xe" for details.
Traceback (most recent call last):
   File "/usr/sbin/cephadm", line 8571, in <module>
     main()
   File "/usr/sbin/cephadm", line 8559, in main
     r = ctx.func(ctx)
   File "/usr/sbin/cephadm", line 1787, in _default_image
     return func(ctx)
   File "/usr/sbin/cephadm", line 4549, in command_deploy
     ports=daemon_ports)
   File "/usr/sbin/cephadm", line 2677, in deploy_daemon
     c, osd_fsid=osd_fsid, ports=ports)
   File "/usr/sbin/cephadm", line 2906, in deploy_daemon_units
     call_throws(ctx, ['systemctl', 'start', unit_name])
   File "/usr/sbin/cephadm", line 1467, in call_throws
     raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: systemctl start
ceph-6d0ecf22-9155-4684-971a-2f6cde8628c8@osd.9

~# cephadm ceph-volume lvm list

====== osd.9 =======

   [block]       /dev/ceph-07fa2bb7-628f-40c0-8725-0266926371c0/osd-
block-941c6cb6-6898-4aa2-a33a-cec3b6a95cf1

       block device              /dev/ceph-07fa2bb7-628f-40c0-8725-
0266926371c0/osd-block-941c6cb6-6898-4aa2-a33a-cec3b6a95cf1
       block uuid                mVEhfF-LK4E-Dtmb-Jj23-tn8x-lpLy-
KiUy1a
       cephx lockbox secret
       cluster fsid              6d0ecf22-9155-4684-971a-2f6cde8628c8
       cluster name              ceph
       crush device class        None
       encrypted                 0
       osd fsid                  941c6cb6-6898-4aa2-a33a-cec3b6a95cf1
       osd id                    9
       type                      block
       vdo                       0
       devices                   /dev/sdd

~# podman --version
podman version 3.2.3

~# cephadm version
Using recent ceph image
quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a
79324739404cc1765728
ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
pacific (stable)

~# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description:    Red Hat Enterprise Linux release 8.5 (Ootpa)
Release:        8.5
Codename:       Ootpa

~# cephadm shell
Inferring fsid 6d0ecf22-9155-4684-971a-2f6cde8628c8
Using recent ceph image
quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a
79324739404cc1765728
[ceph: root@hobro /]#

Yours,
bbk

_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx