Re: osd out cant' bring it back online

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, I deployed via cephadm on CentOS 7, it is using podman. The container doesn't even start up so I don't get a container id. But i checked journalctl -xe, and it seems that its trying to use a container name that still exists.

-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has begun starting up. Dec 01 11:39:29 gedaopl02 podman[9976]: Error: no container with name or ID ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0 found: no such container Dec 01 11:39:29 gedaopl02 systemd[1]: Started Ceph osd.0 for d0920c36-2368-11eb-a5de-005056b703af. -- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has finished starting up.
--
-- The start-up result is done.
Dec 01 11:39:29 gedaopl02 bash[9993]: WARNING: The same type, major and minor should not be used for multiple devices. Dec 01 11:39:29 gedaopl02 bash[9993]: Error: error creating container storage: the container name "ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0-activate" is already in use by "e43f8533d6418267d7e6f3a408a566b4221df4fb51b13d71c634ee697914bad6". You have to remove that container to be able to reuse that name.: that name is already in use Dec 01 11:39:29 gedaopl02 systemd[1]: ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service: main process exited, code=exited, status=125/n/a Dec 01 11:39:29 gedaopl02 bash[10033]: WARNING: The same type, major and minor should not be used for multiple devices. Dec 01 11:39:29 gedaopl02 bash[10033]: Error: error creating container storage: the container name "ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0-deactivate" is already in use by "ef696c5a92ea891cbd7651cdab66abe6c4ba49b70ef06e44b51c9be1cdfc36d9". You have to remove that container to be able to reuse that name.: that name is already in use Dec 01 11:39:29 gedaopl02 systemd[1]: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service entered failed state. Dec 01 11:39:29 gedaopl02 systemd[1]: ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service failed. Dec 01 11:39:39 gedaopl02 systemd[1]: ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service holdoff time over, scheduling restart. Dec 01 11:39:39 gedaopl02 systemd[1]: Stopped Ceph osd.0 for d0920c36-2368-11eb-a5de-005056b703af. -- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has finished shutting down. Dec 01 11:39:39 gedaopl02 systemd[1]: Starting Ceph osd.0 for d0920c36-2368-11eb-a5de-005056b703af... -- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has begun starting up. Dec 01 11:39:39 gedaopl02 podman[10134]: Error: no container with name or ID ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0 found: no such container Dec 01 11:39:39 gedaopl02 systemd[1]: Started Ceph osd.0 for d0920c36-2368-11eb-a5de-005056b703af. -- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has finished starting up.
--
-- The start-up result is done.
Dec 01 11:39:40 gedaopl02 bash[10150]: WARNING: The same type, major and minor should not be used for multiple devices. Dec 01 11:39:40 gedaopl02 bash[10150]: Error: error creating container storage: the container name "ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0-activate" is already in use by "e43f8533d6418267d7e6f3a408a566b4221df4fb51b13d71c634ee697914bad6". You have to remove that container to be able to reuse that name.: that name is already in use Dec 01 11:39:40 gedaopl02 systemd[1]: ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service: main process exited, code=exited, status=125/n/a Dec 01 11:39:40 gedaopl02 bash[10175]: WARNING: The same type, major and minor should not be used for multiple devices. Dec 01 11:39:40 gedaopl02 bash[10175]: Error: error creating container storage: the container name "ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0-deactivate" is already in use by "ef696c5a92ea891cbd7651cdab66abe6c4ba49b70ef06e44b51c9be1cdfc36d9". You have to remove that container to be able to reuse that name.: that name is already in use Dec 01 11:39:40 gedaopl02 systemd[1]: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service entered failed state. Dec 01 11:39:40 gedaopl02 systemd[1]: ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service failed. Dec 01 11:39:50 gedaopl02 systemd[1]: ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service holdoff time over, scheduling restart. Dec 01 11:39:50 gedaopl02 systemd[1]: Stopped Ceph osd.0 for d0920c36-2368-11eb-a5de-005056b703af. -- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has finished shutting down. Dec 01 11:39:50 gedaopl02 systemd[1]: start request repeated too quickly for ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service Dec 01 11:39:50 gedaopl02 systemd[1]: Failed to start Ceph osd.0 for d0920c36-2368-11eb-a5de-005056b703af. -- Subject: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service has failed.
--
-- The result is failed.
Dec 01 11:39:50 gedaopl02 systemd[1]: Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service entered failed state. Dec 01 11:39:50 gedaopl02 systemd[1]: ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service failed. Dec 01 11:39:59 gedaopl02 postfix/smtpd[10257]: connect from localhost[127.0.0.1] Dec 01 11:39:59 gedaopl02 postfix/smtpd[10257]: disconnect from localhost[127.0.0.1] Dec 01 11:40:00 gedaopl02 sshd[10264]: rexec line 32: Deprecated option ServerKeyBits Dec 01 11:40:00 gedaopl02 sshd[10264]: error: Could not load host key: /etc/ssh/ssh_host_dsa_key Dec 01 11:40:00 gedaopl02 sshd[10264]: Connection closed by 127.0.0.1 port 52624 [preauth]

podman ps -a didn't show that container. So I googled and stumbled over this post:

https://github.com/containers/podman/issues/2553

I was able to fix it by running:

podman rm --storage e43f8533d6418267d7e6f3a408a566b4221df4fb51b13d71c634ee697914bad6

After that I reset the failure of the service and started it again.

systemctl reset-failed ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service
systemctl start ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service

Now ceph is doing its magic :)

[root@gedasvl02 ~]# ceph -s
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
  cluster:
    id:     d0920c36-2368-11eb-a5de-005056b703af
    health: HEALTH_WARN
            Degraded data redundancy: 1941/39432 objects degraded (4.922%), 19 pgs degraded, 19 pgs undersized
            8 pgs not deep-scrubbed in time

  services:
    mon: 1 daemons, quorum gedasvl02 (age 2w)
    mgr: gedasvl02.vqswxg(active, since 2w), standbys: gedaopl02.yrwzqh
    mds: cephfs:1 {0=cephfs.gedaopl01.zjuhem=up:active} 1 up:standby
    osd: 3 osds: 3 up (since 9m), 3 in (since 9m); 18 remapped pgs

  task status:
    scrub status:
        mds.cephfs.gedaopl01.zjuhem: idle

  data:
    pools:   7 pools, 225 pgs
    objects: 13.14k objects, 77 GiB
    usage:   214 GiB used, 457 GiB / 671 GiB avail
    pgs:     1941/39432 objects degraded (4.922%)
             206 active+clean
             18  active+undersized+degraded+remapped+backfill_wait
             1   active+undersized+degraded+remapped+backfilling

  io:
    recovery: 105 MiB/s, 25 objects/s

Many thanks for your help. This was an excellent "Recovery training" :)

Am 01.12.2020 um 11:50 schrieb Stefan Kooman:
On 2020-12-01 10:21, Oliver Weinmann wrote:
Hi Stefan,

unfortunately It doesn't start.

The failed osd (osd.0) is located on gedaopl02
I can start the service but then after a minute or so it fails. Maybe
I'm looking at the wrong log file, but it's empty:
Maybe it hits a timeout.
[root@gedaopl02 ~]# tail -f
/var/log/ceph/d0920c36-2368-11eb-a5de-005056b703af/ceph-osd.0.log

Yesterday when I deleted the failed osd and recreated it there were lots
of message in the log file:

https://pastebin.com/5hH27pdR
Mostly housekeeping logs. Are your containers running in docker? A
docker logs $container-id should give you the right logs in that case.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux