Re: How to add back stray OSD daemon after node re-installation

Eugen Block <eblock@xxxxxx> · Thu, 27 May 2021 13:28:57 +0000

Can you try with both cluster and osd fsid? Something like this:

pacific2:~ # cephadm deploy --name osd.2 --fsid  
acbb46d6-bde3-11eb-9cf2-fa163ebb2a74 --osd-fsid  
bc241cd4-e284-4c5a-aad2-5744632fc7fc

I tried to reproduce a similar scenario and found a missing config  
file in the osd directory:

Error: statfs  
/var/lib/ceph/acbb46d6-bde3-11eb-9cf2-fa163ebb2a74/osd.2/config: no  
such file or directory

Check your syslog for more information why the osd start fails.

Zitat von mabi <mabi@xxxxxxxxxxxxx>:

You are right, I used the FSID of the OSD and not of the cluster in  
the deploy command. So now I tried again with the cluster ID as FSID  
but still it does not work as you can see below:

ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid  
8d47792c-987d-11eb-9bb6-a5302e00e1fa
Deploy daemon osd.2 ...
Traceback (most recent call last):
  File "/usr/local/sbin/cephadm", line 6223, in <module>
    r = args.func()
  File "/usr/local/sbin/cephadm", line 1440, in _default_image
    return func()
  File "/usr/local/sbin/cephadm", line 3457, in command_deploy
    deploy_daemon(args.fsid, daemon_type, daemon_id, c, uid, gid,
  File "/usr/local/sbin/cephadm", line 2193, in deploy_daemon
    deploy_daemon_units(fsid, uid, gid, daemon_type, daemon_id, c,
  File "/usr/local/sbin/cephadm", line 2255, in deploy_daemon_units
    assert osd_fsid
AssertionError

In case that's of any help here is the output of the "cephadm  
ceph-volume lvm list" command:

====== osd.2 =======

  [block]        
/dev/ceph-cca8abe6-cf9b-4c2f-ab81-ae0758585414/osd-block-91a86f20-8083-40b1-8bf1-fe35fac3d677

      block device               
/dev/ceph-cca8abe6-cf9b-4c2f-ab81-ae0758585414/osd-block-91a86f20-8083-40b1-8bf1-fe35fac3d677
      block uuid                W3omTg-vami-RB0V-CkVb-cgpb-88Jy-pIK2Tz
      cephx lockbox secret
      cluster fsid              8d47792c-987d-11eb-9bb6-a5302e00e1fa
      cluster name              ceph
      crush device class        None
      encrypted                 0
      osd fsid                  91a86f20-8083-40b1-8bf1-fe35fac3d677
      osd id                    2
      osdspec affinity          all-available-devices
      type                      block
      vdo                       0
      devices                   /dev/sda

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, May 27, 2021 12:32 PM, Eugen Block <eblock@xxxxxx> wrote:

> ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid

> 91a86f20-8083-40b1-8bf1-fe35fac3d677
> Deploy daemon osd.2 ...

Which fsid is it, the cluster's or the OSD's? According to the
'cephadm deploy' help page it should be the cluster fsid.

Zitat von mabi mabi@xxxxxxxxxxxxx:

> Hi Eugen,
> What a good coincidence ;-)
> So I ran "cephadm ceph-volume lvm list" on the OSD node which I
> re-instaled and it saw my osd.2 OSD. So far so good, but the
> following suggested command does not work as you can see below:
> ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid
> 91a86f20-8083-40b1-8bf1-fe35fac3d677
> Deploy daemon osd.2 ...
> Traceback (most recent call last):
> File "/usr/local/sbin/cephadm", line 6223, in <module>
> r = args.func()
> File "/usr/local/sbin/cephadm", line 1440, in _default_image
> return func()
> File "/usr/local/sbin/cephadm", line 3457, in command_deploy
> deploy_daemon(args.fsid, daemon_type, daemon_id, c, uid, gid,
> File "/usr/local/sbin/cephadm", line 2193, in deploy_daemon
> deploy_daemon_units(fsid, uid, gid, daemon_type, daemon_id, c,
> File "/usr/local/sbin/cephadm", line 2255, in deploy_daemon_units
> assert osd_fsid
> AssertionError
> Any ideas what is wrong here?
> Regards,
> Mabi
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, May 27, 2021 12:13 PM, Eugen Block eblock@xxxxxx wrote:
>
> > Hi,
> > I posted a link to the docs [1], [2] just yesterday ;-)
> > You should see the respective OSD in the output of 'cephadm
> > ceph-volume lvm list' on that node. You should then be able to get it
> > back to cephadm with
> > cephadm deploy --name osd.x
> > But I haven't tried this yet myself, so please report back if that
> > works for you.
> > Regards,
> > Eugen
> > [1] https://tracker.ceph.com/issues/49159
> > [2] https://tracker.ceph.com/issues/46691
> > Zitat von mabi mabi@xxxxxxxxxxxxx:
> >
> > > Hello,
> > > I have by mistake re-installed the OS of an OSD node of my Octopus
> > > cluster (managed by cephadm). Luckily the OSD data is on a separate
> > > disk and did not get affected by the re-install.
> > > Now I have the following state:
> > >
> > >     health: HEALTH_WARN
> > >             1 stray daemon(s) not managed by cephadm
> > >             1 osds down
> > >             1 host (1 osds) down
> > >
> > >
> > > To fix that I tried to run:
> > > ceph orch daemon add osd ceph1f:/dev/sda
> > > =========================================
> > > Created no osd(s) on host ceph1f; already created?
> > > That did not work, so I tried:
> > > ceph cephadm osd activate ceph1f
> > > =================================
> > > no valid command found; 10 closest matches:
> > > ...
> > > Error EINVAL: invalid command
> > > Did not work either. So I wanted to ask how can I "adopt" back an
> > > OSD disk to my cluster?
> > > Thanks for your help.
> > > Regards,
> > > Mabi
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx

ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx