I managed to remove that wrongly created cluster on the node running: sudo cephadm rm-cluster --fsid 91a86f20-8083-40b1-8bf1-fe35fac3d677 --force So I am getting closed but the osd.2 service on that node simply does not want to start as you can see below: # ceph orch daemon start osd.2 Scheduled to start osd.2 on host 'ceph1f' # ceph orch ps|grep osd.2 osd.2 ceph1f unknown 2m ago - <unknown> <unknown> <unknown> <unknown> In the log files I see the following: 5/27/21 2:47:34 PM[ERR]`ceph1f: cephadm unit osd.2 start` failed Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1451, in _daemon_action ['--name', name, a]) File "/usr/share/ceph/mgr/cephadm/module.py", line 1168, in _run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:stderr Job for ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service failed because the control process exited with error code. stderr See "systemctl status ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service" and "journalctl -xe" for details. Traceback (most recent call last): File "<stdin>", line 6159, in <module> File "<stdin>", line 1310, in _infer_fsid File "<stdin>", line 3655, in command_unit File "<stdin>", line 1072, in call_throws RuntimeError: Failed command: systemctl start ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2 5/27/21 2:47:34 PM[ERR]cephadm exited with an error code: 1, stderr:stderr Job for ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service failed because the control process exited with error code. stderr See "systemctl status ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service" and "journalctl -xe" for details. Traceback (most recent call last): File "<stdin>", line 6159, in <module> File "<stdin>", line 1310, in _infer_fsid File "<stdin>", line 3655, in command_unit File "<stdin>", line 1072, in call_throws RuntimeError: Failed command: systemctl start ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2 Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1021, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/module.py", line 1168, in _run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:stderr Job for ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service failed because the control process exited with error code. stderr See "systemctl status ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service" and "journalctl -xe" for details. Traceback (most recent call last): File "<stdin>", line 6159, in <module> File "<stdin>", line 1310, in _infer_fsid File "<stdin>", line 3655, in command_unit File "<stdin>", line 1072, in call_throws RuntimeError: Failed command: systemctl start ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2 And finally the systemctl "status" of that osd.2 service on the OSD node: ubuntu@ceph1f:/var/lib/ceph$ sudo systemctl status ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service ● ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service - Ceph osd.2 for 8d47792c-987d-11eb-9bb6-a5302e00e1fa Loaded: loaded (/etc/systemd/system/ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@.service; disabled; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2021-05-27 14:48:24 CEST; 20s ago Process: 56163 ExecStartPre=/bin/rm -f //run/ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service-pid //run/ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service-cid (code=exited, status=0/SUCCESS) Process: 56164 ExecStart=/bin/bash /var/lib/ceph/8d47792c-987d-11eb-9bb6-a5302e00e1fa/osd.2/unit.run (code=exited, status=127) Process: 56165 ExecStopPost=/bin/bash /var/lib/ceph/8d47792c-987d-11eb-9bb6-a5302e00e1fa/osd.2/unit.poststop (code=exited, status=127) Process: 56166 ExecStopPost=/bin/rm -f //run/ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service-pid //run/ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service-cid (code=exited, status=0/SUCCESS) May 27 14:48:14 ceph1f systemd[1]: Failed to start Ceph osd.2 for 8d47792c-987d-11eb-9bb6-a5302e00e1fa. May 27 14:48:24 ceph1f systemd[1]: ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service: Scheduled restart job, restart counter is at 5. May 27 14:48:24 ceph1f systemd[1]: Stopped Ceph osd.2 for 8d47792c-987d-11eb-9bb6-a5302e00e1fa. May 27 14:48:24 ceph1f systemd[1]: ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service: Start request repeated too quickly. May 27 14:48:24 ceph1f systemd[1]: ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa@osd.2.service: Failed with result 'exit-code'. May 27 14:48:24 ceph1f systemd[1]: Failed to start Ceph osd.2 for 8d47792c-987d-11eb-9bb6-a5302e00e1fa. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Thursday, May 27, 2021 2:22 PM, mabi <mabi@xxxxxxxxxxxxx> wrote: > I am trying to run "cephadm shell" on that newly installed OSD node and it seems that I have now unfortunately configured a new cluster ID as it shows: > > ubuntu@ceph1f:~$ sudo cephadm shell > ERROR: Cannot infer an fsid, one must be specified: ['8d47792c-987d-11eb-9bb6-a5302e00e1fa', '91a86f20-8083-40b1-8bf1-fe35fac3d677'] > > Maybe this is causing trouble... So is there any method where I can remove the wrongly new created cluster ID 91a86f20-8083-40b1-8bf1-fe35fac3d677 ?? > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Thursday, May 27, 2021 12:58 PM, mabi mabi@xxxxxxxxxxxxx wrote: > > > You are right, I used the FSID of the OSD and not of the cluster in the deploy command. So now I tried again with the cluster ID as FSID but still it does not work as you can see below: > > ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid 8d47792c-987d-11eb-9bb6-a5302e00e1fa > > Deploy daemon osd.2 ... > > Traceback (most recent call last): > > File "/usr/local/sbin/cephadm", line 6223, in <module> > > > > r = args.func() > > > > > > File "/usr/local/sbin/cephadm", line 1440, in _default_image > > return func() > > File "/usr/local/sbin/cephadm", line 3457, in command_deploy > > deploy_daemon(args.fsid, daemon_type, daemon_id, c, uid, gid, > > File "/usr/local/sbin/cephadm", line 2193, in deploy_daemon > > deploy_daemon_units(fsid, uid, gid, daemon_type, daemon_id, c, > > File "/usr/local/sbin/cephadm", line 2255, in deploy_daemon_units > > assert osd_fsid > > AssertionError > > In case that's of any help here is the output of the "cephadm ceph-volume lvm list" command: > > ====== osd.2 ======= > > [block] /dev/ceph-cca8abe6-cf9b-4c2f-ab81-ae0758585414/osd-block-91a86f20-8083-40b1-8bf1-fe35fac3d677 > > block device /dev/ceph-cca8abe6-cf9b-4c2f-ab81-ae0758585414/osd-block-91a86f20-8083-40b1-8bf1-fe35fac3d677 > > block uuid W3omTg-vami-RB0V-CkVb-cgpb-88Jy-pIK2Tz > > cephx lockbox secret > > cluster fsid 8d47792c-987d-11eb-9bb6-a5302e00e1fa > > cluster name ceph > > crush device class None > > encrypted 0 > > osd fsid 91a86f20-8083-40b1-8bf1-fe35fac3d677 > > osd id 2 > > osdspec affinity all-available-devices > > type block > > vdo 0 > > devices /dev/sda > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > On Thursday, May 27, 2021 12:32 PM, Eugen Block eblock@xxxxxx wrote: > > > > > > ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid > > > > > > > 91a86f20-8083-40b1-8bf1-fe35fac3d677 > > > > Deploy daemon osd.2 ... > > > > > > Which fsid is it, the cluster's or the OSD's? According to the > > > 'cephadm deploy' help page it should be the cluster fsid. > > > Zitat von mabi mabi@xxxxxxxxxxxxx: > > > > > > > Hi Eugen, > > > > What a good coincidence ;-) > > > > So I ran "cephadm ceph-volume lvm list" on the OSD node which I > > > > re-instaled and it saw my osd.2 OSD. So far so good, but the > > > > following suggested command does not work as you can see below: > > > > ubuntu@ceph1f:~$ sudo cephadm deploy --name osd.2 --fsid > > > > 91a86f20-8083-40b1-8bf1-fe35fac3d677 > > > > Deploy daemon osd.2 ... > > > > Traceback (most recent call last): > > > > File "/usr/local/sbin/cephadm", line 6223, in <module> > > > > r = args.func() > > > > File "/usr/local/sbin/cephadm", line 1440, in _default_image > > > > return func() > > > > File "/usr/local/sbin/cephadm", line 3457, in command_deploy > > > > deploy_daemon(args.fsid, daemon_type, daemon_id, c, uid, gid, > > > > File "/usr/local/sbin/cephadm", line 2193, in deploy_daemon > > > > deploy_daemon_units(fsid, uid, gid, daemon_type, daemon_id, c, > > > > File "/usr/local/sbin/cephadm", line 2255, in deploy_daemon_units > > > > assert osd_fsid > > > > AssertionError > > > > Any ideas what is wrong here? > > > > Regards, > > > > Mabi > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > > On Thursday, May 27, 2021 12:13 PM, Eugen Block eblock@xxxxxx wrote: > > > > > > > > > Hi, > > > > > I posted a link to the docs [1], [2] just yesterday ;-) > > > > > You should see the respective OSD in the output of 'cephadm > > > > > ceph-volume lvm list' on that node. You should then be able to get it > > > > > back to cephadm with > > > > > cephadm deploy --name osd.x > > > > > But I haven't tried this yet myself, so please report back if that > > > > > works for you. > > > > > Regards, > > > > > Eugen > > > > > [1] https://tracker.ceph.com/issues/49159 > > > > > [2] https://tracker.ceph.com/issues/46691 > > > > > Zitat von mabi mabi@xxxxxxxxxxxxx: > > > > > > > > > > > Hello, > > > > > > I have by mistake re-installed the OS of an OSD node of my Octopus > > > > > > cluster (managed by cephadm). Luckily the OSD data is on a separate > > > > > > disk and did not get affected by the re-install. > > > > > > Now I have the following state: > > > > > > > > > > > > health: HEALTH_WARN > > > > > > 1 stray daemon(s) not managed by cephadm > > > > > > 1 osds down > > > > > > 1 host (1 osds) down > > > > > > > > > > > > > > > > > > To fix that I tried to run: > > > > > > ceph orch daemon add osd ceph1f:/dev/sda > > > > > > ===================================================================== > > > > > > Created no osd(s) on host ceph1f; already created? > > > > > > That did not work, so I tried: > > > > > > ceph cephadm osd activate ceph1f > > > > > > =================================================================================================================== > > > > > > no valid command found; 10 closest matches: > > > > > > ... > > > > > > Error EINVAL: invalid command > > > > > > Did not work either. So I wanted to ask how can I "adopt" back an > > > > > > OSD disk to my cluster? > > > > > > Thanks for your help. > > > > > > Regards, > > > > > > Mabi > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx