Hi everybody. I've faced the situation when I cannot redeploy OSD on a new disk So, I need to replace osd.30 cuz disk always reports about problems with I\O. I do `ceph orch daemon osd.30 --replace` Then I zap DB ``` root@server-2:/# ceph-volume lvm zap /dev/ceph-db/db-88 --> Zapping: /dev/ceph-db/db-88 Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-db/db-88 bs=1M count=10 conv=fsync stderr: 10+0 records in 10+0 records out stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0247342 s, 424 MB/s --> Zapping successful for: <LV: /dev/ceph-db/db-88> ``` And now zap DATA ``` root@server-2:/# ceph-volume lvm zap /dev/sdn --> Zapping: /dev/sdn --> --destroy was not specified, but zapping a whole device will remove the partition table Running command: /usr/bin/dd if=/dev/zero of=/dev/sdn bs=1M count=10 conv=fsync stderr: 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 1.35239 s, 7.8 MB/s --> Zapping successful for: <Raw Device: /dev/sdn> ``` Okay, now disk is ready and orchestrator confirms it ``` root@server-1:~# ceph orch device ls host server-2 --refresh server-2 /dev/sdn hdd ST18000NM008J_5000c500d80398bf 16.3T Yes 4m ago ``` Now its time for orchestrator to add new osd ``` root@server-1:~# ceph orch daemon add osd server-2:data_devices=/dev/sdn,db_devices=/dev/ceph-db/db-88 Created no osd(s) on host server-2; already created? ``` But it gives osd.30 in state down. If I try to run systemd service manually, it cannot start because ``` Apr 02 12:30:41 server-2 systemd[1]: Started Ceph osd.30 for ea98e312-dfd9-11ee-a226-33f018c3a407. Apr 02 12:30:41 server-2 bash[3316003]: /bin/bash: /var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/osd.30/unit.run: No such file or directory Apr 02 12:30:41 server-2 systemd[1]: ceph-ea98e312-dfd9-11ee-a226-33f018c3a407@osd.30.service: Main process exited, code=exited, status=127/n/a Apr 02 12:30:41 server-2 bash[3316014]: /bin/bash: /var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/osd.30/unit.poststop: No such file or directory Apr 02 12:30:41 server-2 systemd[1]: ceph-ea98e312-dfd9-11ee-a226-33f018c3a407@osd.30.service: Failed with result 'exit-code'. Apr 02 12:30:51 server-2 systemd[1]: ceph-ea98e312-dfd9-11ee-a226-33f018c3a407@osd.30.service: Scheduled restart job, restart counter is at 1. Apr 02 12:30:51 server-2 systemd[1]: Stopped Ceph osd.30 for ea98e312-dfd9-11ee-a226-33f018c3a407. ``` And even if I try to redeploy osd.30 by ceph orch osd redeploy osd.30 then I get the error in ceph -W cephadm ``` 2024-04-02T12:41:39.856767+0000 mgr.server-2.opelxj (mgr.2994187) 5453 : cephadm [INF] Reconfiguring daemon osd.30 on server-2 2024-04-02T12:41:41.048352+0000 mgr.server-2.opelxj (mgr.2994187) 5454 : cephadm [ERR] cephadm exited with an error code: 1, stderr: Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}} ceph-ea98e312-dfd9-11ee-a226-33f018c3a407-osd-30 /usr/bin/docker: stdout /usr/bin/docker: stderr Error response from daemon: No such container: ceph-ea98e312-dfd9-11ee-a226-33f018c3a407-osd-30 Non-zero exit code 1 from /usr/bin/docker container inspect --format {{.State.Status}} ceph-ea98e312-dfd9-11ee-a226-33f018c3a407-osd.30 /usr/bin/docker: stdout /usr/bin/docker: stderr Error response from daemon: No such container: ceph-ea98e312-dfd9-11ee-a226-33f018c3a407-osd.30 Reconfig daemon osd.30 ... Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py", line 10700, in <module> File "/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py", line 10688, in main File "/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py", line 6620, in command_deploy_from File "/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py", line 6638, in _common_deploy File "/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py", line 6666, in _dispatch_deploy File "/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py", line 3792, in deploy_daemon File "/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py", line 3078, in create_daemon_dirs File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__ next(self.gen) File "/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/cephadm.8c89112927b45a1984d03fb02785df709234bdb856619c217e1ad5d54aebef2b/__main__.py", line 708, in write_new IsADirectoryError: [Errno 21] Is a directory: '/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/osd.30/config.new' -> '/var/lib/ceph/ea98e312-dfd9-11ee-a226-33f018c3a407/osd.30/config' ``` Could anyone point direction where I should dig into? Thanks in advance! _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx