Yes, I deployed via cephadm on CentOS 7, it is using podman. The
container doesn't even start up so I don't get a container id. But i
checked journalctl -xe, and it seems that its trying to use a container
name that still exists.
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has
begun starting up.
Dec 01 11:39:29 gedaopl02 podman[9976]: Error: no container with name or
ID ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0 found: no such container
Dec 01 11:39:29 gedaopl02 systemd[1]: Started Ceph osd.0 for
d0920c36-2368-11eb-a5de-005056b703af.
-- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has
finished starting up.
--
-- The start-up result is done.
Dec 01 11:39:29 gedaopl02 bash[9993]: WARNING: The same type, major and
minor should not be used for multiple devices.
Dec 01 11:39:29 gedaopl02 bash[9993]: Error: error creating container
storage: the container name
"ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0-activate" is already in
use by
"e43f8533d6418267d7e6f3a408a566b4221df4fb51b13d71c634ee697914bad6". You
have to remove that container to be able to reuse that name.: that name
is already in use
Dec 01 11:39:29 gedaopl02 systemd[1]:
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service: main process
exited, code=exited, status=125/n/a
Dec 01 11:39:29 gedaopl02 bash[10033]: WARNING: The same type, major and
minor should not be used for multiple devices.
Dec 01 11:39:29 gedaopl02 bash[10033]: Error: error creating container
storage: the container name
"ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0-deactivate" is already
in use by
"ef696c5a92ea891cbd7651cdab66abe6c4ba49b70ef06e44b51c9be1cdfc36d9". You
have to remove that container to be able to reuse that name.: that name
is already in use
Dec 01 11:39:29 gedaopl02 systemd[1]: Unit
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service entered failed
state.
Dec 01 11:39:29 gedaopl02 systemd[1]:
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service failed.
Dec 01 11:39:39 gedaopl02 systemd[1]:
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service holdoff time
over, scheduling restart.
Dec 01 11:39:39 gedaopl02 systemd[1]: Stopped Ceph osd.0 for
d0920c36-2368-11eb-a5de-005056b703af.
-- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has
finished shutting down.
Dec 01 11:39:39 gedaopl02 systemd[1]: Starting Ceph osd.0 for
d0920c36-2368-11eb-a5de-005056b703af...
-- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has
begun starting up.
Dec 01 11:39:39 gedaopl02 podman[10134]: Error: no container with name
or ID ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0 found: no such
container
Dec 01 11:39:39 gedaopl02 systemd[1]: Started Ceph osd.0 for
d0920c36-2368-11eb-a5de-005056b703af.
-- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has
finished starting up.
--
-- The start-up result is done.
Dec 01 11:39:40 gedaopl02 bash[10150]: WARNING: The same type, major and
minor should not be used for multiple devices.
Dec 01 11:39:40 gedaopl02 bash[10150]: Error: error creating container
storage: the container name
"ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0-activate" is already in
use by
"e43f8533d6418267d7e6f3a408a566b4221df4fb51b13d71c634ee697914bad6". You
have to remove that container to be able to reuse that name.: that name
is already in use
Dec 01 11:39:40 gedaopl02 systemd[1]:
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service: main process
exited, code=exited, status=125/n/a
Dec 01 11:39:40 gedaopl02 bash[10175]: WARNING: The same type, major and
minor should not be used for multiple devices.
Dec 01 11:39:40 gedaopl02 bash[10175]: Error: error creating container
storage: the container name
"ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0-deactivate" is already
in use by
"ef696c5a92ea891cbd7651cdab66abe6c4ba49b70ef06e44b51c9be1cdfc36d9". You
have to remove that container to be able to reuse that name.: that name
is already in use
Dec 01 11:39:40 gedaopl02 systemd[1]: Unit
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service entered failed
state.
Dec 01 11:39:40 gedaopl02 systemd[1]:
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service failed.
Dec 01 11:39:50 gedaopl02 systemd[1]:
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service holdoff time
over, scheduling restart.
Dec 01 11:39:50 gedaopl02 systemd[1]: Stopped Ceph osd.0 for
d0920c36-2368-11eb-a5de-005056b703af.
-- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has
finished shutting down.
Dec 01 11:39:50 gedaopl02 systemd[1]: start request repeated too quickly
for ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
Dec 01 11:39:50 gedaopl02 systemd[1]: Failed to start Ceph osd.0 for
d0920c36-2368-11eb-a5de-005056b703af.
-- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has failed.
--
-- The result is failed.
Dec 01 11:39:50 gedaopl02 systemd[1]: Unit
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service entered failed
state.
Dec 01 11:39:50 gedaopl02 systemd[1]:
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service failed.
Dec 01 11:39:59 gedaopl02 postfix/smtpd[10257]: connect from
localhost[127.0.0.1]
Dec 01 11:39:59 gedaopl02 postfix/smtpd[10257]: disconnect from
localhost[127.0.0.1]
Dec 01 11:40:00 gedaopl02 sshd[10264]: rexec line 32: Deprecated option
ServerKeyBits
Dec 01 11:40:00 gedaopl02 sshd[10264]: error: Could not load host key:
/etc/ssh/ssh_host_dsa_key
Dec 01 11:40:00 gedaopl02 sshd[10264]: Connection closed by 127.0.0.1
port 52624 [preauth]
podman ps -a didn't show that container. So I googled and stumbled over
this post:
https://github.com/containers/podman/issues/2553
I was able to fix it by running:
podman rm --storage
e43f8533d6418267d7e6f3a408a566b4221df4fb51b13d71c634ee697914bad6
After that I reset the failure of the service and started it again.
systemctl reset-failed
ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
systemctl start ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
Now ceph is doing its magic :)
[root@gedasvl02 ~]# ceph -s
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config
/var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
cluster:
id: d0920c36-2368-11eb-a5de-005056b703af
health: HEALTH_WARN
Degraded data redundancy: 1941/39432 objects degraded
(4.922%), 19 pgs degraded, 19 pgs undersized
8 pgs not deep-scrubbed in time
services:
mon: 1 daemons, quorum gedasvl02 (age 2w)
mgr: gedasvl02.vqswxg(active, since 2w), standbys: gedaopl02.yrwzqh
mds: cephfs:1 {0=cephfs.gedaopl01.zjuhem=up:active} 1 up:standby
osd: 3 osds: 3 up (since 9m), 3 in (since 9m); 18 remapped pgs
task status:
scrub status:
mds.cephfs.gedaopl01.zjuhem: idle
data:
pools: 7 pools, 225 pgs
objects: 13.14k objects, 77 GiB
usage: 214 GiB used, 457 GiB / 671 GiB avail
pgs: 1941/39432 objects degraded (4.922%)
206 active+clean
18 active+undersized+degraded+remapped+backfill_wait
1 active+undersized+degraded+remapped+backfilling
io:
recovery: 105 MiB/s, 25 objects/s
Many thanks for your help. This was an excellent "Recovery training" :)
Am 01.12.2020 um 11:50 schrieb Stefan Kooman:
On 2020-12-01 10:21, Oliver Weinmann wrote:
Hi Stefan,
unfortunately It doesn't start.
The failed osd (osd.0) is located on gedaopl02
I can start the service but then after a minute or so it fails. Maybe
I'm looking at the wrong log file, but it's empty:
Maybe it hits a timeout.
[root@gedaopl02 ~]# tail -f
/var/log/ceph/d0920c36-2368-11eb-a5de-005056b703af/ceph-osd.0.log
Yesterday when I deleted the failed osd and recreated it there were lots
of message in the log file:
https://pastebin.com/5hH27pdR
Mostly housekeeping logs. Are your containers running in docker? A
docker logs $container-id should give you the right logs in that case.
Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx