Re: How to add back stray OSD daemon after node re-installation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It works again but I had to do after a start/stop of the OSD on an admin node:

# ceph orch daemon stop osd.2
# ceph orch daemon start tosd.2

What an adventure, thanks again so much for your help!

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, May 27, 2021 3:37 PM, Eugen Block <eblock@xxxxxx> wrote:

> That file is in the regular filesystem, you can copy it from a
> different osd directory, it just a minimal ceph.conf. The directory
> for the failing osd should now be present after the failed attempts.
>
> Zitat von mabi mabi@xxxxxxxxxxxxx:
>
> > Nicely spotted about the missing file, it looks like I have the same
> > case as you can see below from the syslog:
> > May 27 15:33:12 ceph1f systemd[1]:
> > ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service: Scheduled
> > restart job, restart counter is at 1.
> > May 27 15:33:12 ceph1f systemd[1]: Stopped Ceph osd.2 for
> > 8d47792c-987d-11eb-9bb6-a5302e00e1fa.
> > May 27 15:33:12 ceph1f systemd[1]: Starting Ceph osd.2 for
> > 8d47792c-987d-11eb-9bb6-a5302e00e1fa...
> > May 27 15:33:12 ceph1f kernel: [19332.481779] overlayfs:
> > unrecognized mount option "volatile" or missing value
> > May 27 15:33:13 ceph1f kernel: [19332.709205] overlayfs:
> > unrecognized mount option "volatile" or missing value
> > May 27 15:33:13 ceph1f kernel: [19332.933442] overlayfs:
> > unrecognized mount option "volatile" or missing value
> > May 27 15:33:13 ceph1f bash[64982]: Error: statfs
> > /var/lib/ceph/8d47792c-987d-11eb-9bb6-a5302e00e1fa/osd.2/config: no
> > such file or directory
> > May 27 15:33:13 ceph1f systemd[1]:
> > ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service: Control
> > process exited, code=exited, status=125/n/a
> > So how do I go to generate/create that missing
> > /var/lib/ceph/8d47792c-987d-11eb-9bb6-a5302e00e1fa/osd.2/config file?
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Thursday, May 27, 2021 3:28 PM, Eugen Block eblock@xxxxxx wrote:
> >
> > > Can you try with both cluster and osd fsid? Something like this:
> > > pacific2:~ # cephadm deploy --name osd.2 --fsid
> > > acbb46d6-bde3-11eb-9cf2-fa163ebb2a74 --osd-fsid
> > > bc241cd4-e284-4c5a-aad2-5744632fc7fc
> > > I tried to reproduce a similar scenario and found a missing config
> > > file in the osd directory:
> > > Error: statfs
> > > /var/lib/ceph/acbb46d6-bde3-11eb-9cf2-fa163ebb2a74/osd.2/config: no
> > > such file or directory
> > > Check your syslog for more information why the osd start fails.
> > > Zitat von mabi mabi@xxxxxxxxxxxxx:
> > >
> > > > You are right, I used the FSID of the OSD and not of the cluster in
> > > > the deploy command. So now I tried again with the cluster ID as FSID
> > > > but still it does not work as you can see below:
> > > > ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid
> > > > 8d47792c-987d-11eb-9bb6-a5302e00e1fa
> > > > Deploy daemon osd.2 ...
> > > > Traceback (most recent call last):
> > > > File "/usr/local/sbin/cephadm", line 6223, in <module>
> > > > r = args.func()
> > > > File "/usr/local/sbin/cephadm", line 1440, in _default_image
> > > > return func()
> > > > File "/usr/local/sbin/cephadm", line 3457, in command_deploy
> > > > deploy_daemon(args.fsid, daemon_type, daemon_id, c, uid, gid,
> > > > File "/usr/local/sbin/cephadm", line 2193, in deploy_daemon
> > > > deploy_daemon_units(fsid, uid, gid, daemon_type, daemon_id, c,
> > > > File "/usr/local/sbin/cephadm", line 2255, in deploy_daemon_units
> > > > assert osd_fsid
> > > > AssertionError
> > > > In case that's of any help here is the output of the "cephadm
> > > > ceph-volume lvm list" command:
> > > > ====== osd.2 =======
> > > > [block]
> > >
> > > /dev/ceph-cca8abe6-cf9b-4c2f-ab81-ae0758585414/osd-block-91a86f20-8083-40b1-8bf1-fe35fac3d677
> > >
> > > >       block device
> > > >
> > >
> > > /dev/ceph-cca8abe6-cf9b-4c2f-ab81-ae0758585414/osd-block-91a86f20-8083-40b1-8bf1-fe35fac3d677
> > >
> > > > block uuid W3omTg-vami-RB0V-CkVb-cgpb-88Jy-pIK2Tz
> > > > cephx lockbox secret
> > > > cluster fsid 8d47792c-987d-11eb-9bb6-a5302e00e1fa
> > > > cluster name ceph
> > > > crush device class None
> > > > encrypted 0
> > > > osd fsid 91a86f20-8083-40b1-8bf1-fe35fac3d677
> > > > osd id 2
> > > > osdspec affinity all-available-devices
> > > > type block
> > > > vdo 0
> > > > devices /dev/sda
> > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > On Thursday, May 27, 2021 12:32 PM, Eugen Block eblock@xxxxxx wrote:
> > > >
> > > > > > ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid
> > > > >
> > > > > > 91a86f20-8083-40b1-8bf1-fe35fac3d677
> > > > > > Deploy daemon osd.2 ...
> > > > >
> > > > > Which fsid is it, the cluster's or the OSD's? According to the
> > > > > 'cephadm deploy' help page it should be the cluster fsid.
> > > > > Zitat von mabi mabi@xxxxxxxxxxxxx:
> > > > >
> > > > > > Hi Eugen,
> > > > > > What a good coincidence ;-)
> > > > > > So I ran "cephadm ceph-volume lvm list" on the OSD node which I
> > > > > > re-instaled and it saw my osd.2 OSD. So far so good, but the
> > > > > > following suggested command does not work as you can see below:
> > > > > > ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid
> > > > > > 91a86f20-8083-40b1-8bf1-fe35fac3d677
> > > > > > Deploy daemon osd.2 ...
> > > > > > Traceback (most recent call last):
> > > > > > File "/usr/local/sbin/cephadm", line 6223, in <module>
> > > > > > r = args.func()
> > > > > > File "/usr/local/sbin/cephadm", line 1440, in _default_image
> > > > > > return func()
> > > > > > File "/usr/local/sbin/cephadm", line 3457, in command_deploy
> > > > > > deploy_daemon(args.fsid, daemon_type, daemon_id, c, uid, gid,
> > > > > > File "/usr/local/sbin/cephadm", line 2193, in deploy_daemon
> > > > > > deploy_daemon_units(fsid, uid, gid, daemon_type, daemon_id, c,
> > > > > > File "/usr/local/sbin/cephadm", line 2255, in deploy_daemon_units
> > > > > > assert osd_fsid
> > > > > > AssertionError
> > > > > > Any ideas what is wrong here?
> > > > > > Regards,
> > > > > > Mabi
> > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > > > On Thursday, May 27, 2021 12:13 PM, Eugen Block eblock@xxxxxx wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > I posted a link to the docs [1], [2] just yesterday ;-)
> > > > > > > You should see the respective OSD in the output of 'cephadm
> > > > > > > ceph-volume lvm list' on that node. You should then be able
> > > > > > > to get it
> > > >
> > > > > > > back to cephadm with
> > > > > > > cephadm deploy --name osd.x
> > > > > > > But I haven't tried this yet myself, so please report back if that
> > > > > > > works for you.
> > > > > > > Regards,
> > > > > > > Eugen
> > > > > > > [1] https://tracker.ceph.com/issues/49159
> > > > > > > [2] https://tracker.ceph.com/issues/46691
> > > > > > > Zitat von mabi mabi@xxxxxxxxxxxxx:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > > I have by mistake re-installed the OS of an OSD node of my Octopus
> > > > > > > > cluster (managed by cephadm). Luckily the OSD data is on
> > > > > > > > a separate
> > > >
> > > > > > > > disk and did not get affected by the re-install.
> > > > > > > > Now I have the following state:
> > > > > > > >
> > > > > > > >     health: HEALTH_WARN
> > > > > > > >             1 stray daemon(s) not managed by cephadm
> > > > > > > >             1 osds down
> > > > > > > >             1 host (1 osds) down
> > > > > > > >
> > > > > > > >
> > > > > > > > To fix that I tried to run:
> > > > > > > > ceph orch daemon add osd ceph1f:/dev/sda
> > >
> > > =====================================================================
> > >
> > > > > > > > Created no osd(s) on host ceph1f; already created?
> > > > > > > > That did not work, so I tried:
> > > > > > > > ceph cephadm osd activate ceph1f
> > >
> > > ===================================================================================================================
> > >
> > > > > > > > no valid command found; 10 closest matches:
> > > > > > > > ...
> > > > > > > > Error EINVAL: invalid command
> > > > > > > > Did not work either. So I wanted to ask how can I "adopt" back an
> > > > > > > > OSD disk to my cluster?
> > > > > > > > Thanks for your help.
> > > > > > > > Regards,
> > > > > > > > Mabi
> > > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > > > >
> > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > > >
> > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux