Hello Eugen, cephadm ls for OSD.41: { "style": "cephadm:v1", "name": "osd.41", "fsid": "5436dd5d-83d4-4dc8-a93b-60ab5db145df", "systemd_unit": "ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41", "enabled": true, "state": "error", "container_id": null, "container_image_name": "docker.io/ceph/ceph:v15.2.5", "container_image_id": null, "version": null, "started": null, "created": "2020-07-28T12:42:17.292765", "deployed": "2020-10-21T11:29:36.284462", "configured": "2020-10-21T11:29:47.032038" }, root@ceph06:~# systemctl start ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service Job for ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service failed because the control process exited with error code. See "systemctl status ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service" and "journalctl -xe" for details. ● ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service - Ceph osd.41 for 5436dd5d-83d4-4dc8-a93b-60ab5db145df Loaded: loaded (/etc/systemd/system/ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Mon 2020-11-02 10:56:50 CET; 9min ago Process: 430022 ExecStartPre=/usr/bin/docker rm ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df-osd.41 (code=exited, status=1/FAILURE) Process: 430040 ExecStart=/bin/bash /var/lib/ceph/5436dd5d-83d4-4dc8-a93b-60ab5db145df/osd.41/unit.run (code=exited, status=125) Process: 430159 ExecStopPost=/bin/bash /var/lib/ceph/5436dd5d-83d4-4dc8-a93b-60ab5db145df/osd.41/unit.poststop (code=exited, status=0/SUCCESS) Main PID: 430040 (code=exited, status=125) Tasks: 51 (limit: 9830) Memory: 31.0M CGroup: /system.slice/system-ceph\x2d5436dd5d\x2d83d4\x2d4dc8\x2da93b\x2d60ab5db145df.slice/ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service ├─224974 /bin/bash /var/lib/ceph/5436dd5d-83d4-4dc8-a93b-60ab5db145df/osd.41/unit.run └─225079 /usr/bin/docker run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df-osd.41 -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15.2.5 -e NODE_NAME...... Nov 02 10:56:50 ceph06 systemd[1]: Failed to start Ceph osd.41 for 5436dd5d-83d4-4dc8-a93b-60ab5db145df. Nov 02 11:01:21 ceph06 systemd[1]: ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service: Start request repeated too quickly. Nov 02 11:01:21 ceph06 systemd[1]: ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service: Failed with result 'exit-code'. Nov 02 11:01:21 ceph06 systemd[1]: Failed to start Ceph osd.41 for 5436dd5d-83d4-4dc8-a93b-60ab5db145df. Nov 02 11:01:49 ceph06 systemd[1]: ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service: Start request repeated too quickly. Nov 02 11:01:49 ceph06 systemd[1]: ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service: Failed with result 'exit-code'. Nov 02 11:01:49 ceph06 systemd[1]: Failed to start Ceph osd.41 for 5436dd5d-83d4-4dc8-a93b-60ab5db145df. Nov 02 11:05:34 ceph06 systemd[1]: ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service: Start request repeated too quickly. Nov 02 11:05:34 ceph06 systemd[1]: ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41.service: Failed with result 'exit-code'. Nov 02 11:05:34 ceph06 systemd[1]: Failed to start Ceph osd.41 for 5436dd5d-83d4-4dc8-a93b-60ab5db145df. If i run it manually, i get: root@ceph06:~# /usr/bin/docker run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df-osd.41 -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15.2.5 -e NODE_NAME=ceph06 -v /var/run/ceph/5436dd5d-83d4-4dc8-a93b-60ab5db145df:/var/run/ceph:z -v /var/log/ceph/5436dd5d-83d4-4dc8-a93b-60ab5db145df:/var/log/ceph:z -v /var/lib/ceph/5436dd5d-83d4-4dc8-a93b-60ab5db145df/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/5436dd5d-83d4-4dc8-a93b-60ab5db145df/osd.41:/var/lib/ceph/osd/ceph-41:z -v /var/lib/ceph/5436dd5d-83d4-4dc8-a93b-60ab5db145df/osd.41/config:/etc/ceph/ceph.conf:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm --entrypoint /usr/bin/ceph-osd docker.io/ceph/ceph:v15.2.5 -n osd.41 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug /usr/bin/docker: Error response from daemon: endpoint with name ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df-osd.41 already exists in network host. Can you see a ContainerID Error here? cluster id is: 5436dd5d-83d4-4dc8-a93b-60ab5db145df On Mon, Nov 2, 2020 at 10:03 AM Eugen Block <eblock@xxxxxx> wrote: > > Hi, > > are you sure it's the right container ID you're using for the restart? > I noticed that 'cephadm ls' shows older containers after a daemon had > to be recreated (a MGR in my case). Maybe you're trying to restart a > daemon that was already removed? > > Regards, > Eugen > > > Zitat von Ml Ml <mliebherr99@xxxxxxxxxxxxxx>: > > > Hello List, > > sometimes some OSD get taken our for some reason ( i am still looking > > for the reason, and i guess its due to some overload), however, when i > > try to restart them i get: > > > > Nov 02 08:05:26 ceph05 bash[9811]: Error: No such container: > > ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df-osd.47 > > Nov 02 08:05:29 ceph05 bash[9811]: /usr/bin/docker: Error response > > from daemon: endpoint with name > > ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df-osd.47 already exists in > > network host. > > Nov 02 08:05:29 ceph05 systemd[1]: > > ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.47.service: Main process > > exited, code=exited, status=125/n/a > > Nov 02 08:05:34 ceph05 systemd[1]: > > ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.47.service: Failed with > > result 'exit-code'. > > Nov 02 08:05:44 ceph05 systemd[1]: > > ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.47.service: Service > > RestartSec=10s expired, scheduling restart. > > Nov 02 08:05:44 ceph05 systemd[1]: > > ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.47.service: Scheduled > > restart job, restart counter is at 5. > > Nov 02 08:05:44 ceph05 systemd[1]: Stopped Ceph osd.47 for > > 5436dd5d-83d4-4dc8-a93b-60ab5db145df. > > Nov 02 08:05:44 ceph05 systemd[1]: > > ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.47.service: Start > > request repeated too quickly. > > Nov 02 08:05:44 ceph05 systemd[1]: > > ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.47.service: Failed with > > result 'exit-code'. > > Nov 02 08:05:44 ceph05 systemd[1]: Failed to start Ceph osd.47 for > > 5436dd5d-83d4-4dc8-a93b-60ab5db145df. > > > > I need to reboot the full host to get the OSD back in again. As far i > > can see this is some docker problem? > > > > root@ceph05:~# docker ps | grep osd.47 => no hit > > root@ceph05:~# docker network prune => does not solve the problem > > Any hint on that? > > > > Thanks, > > Michael > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx