Hi Oliver, # ssh gedaopl02 # cephadm rm-daemon osd.0 should do the trick. Be careful to remove the broken OSD :-) Best, Sebastian Am 11.03.21 um 22:10 schrieb Oliver Weinmann: > Hi, > > On my 3 node Octopus 15.2.5 test cluster, that I haven't used for quite > a while, I noticed that it shows some errors: > > [root@gedasvl02 ~]# ceph health detail > INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af > INFO:cephadm:Inferring config > /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config > INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 > HEALTH_WARN 2 failed cephadm daemon(s) > [WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s) > daemon osd.0 on gedaopl02 is in error state > daemon node-exporter.gedaopl01 on gedaopl01 is in error state > > The error about the osd.0 is strange since osd.0 is actually up and > running but on a different node. I guess I missed to correctly remove it > from node gedaopl02 and then added a new osd to a different node > gedaopl01 and now there are duplicate osd ids for osd.0 and osd.2. > > [root@gedasvl02 ~]# ceph orch ps > INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af > INFO:cephadm:Inferring config > /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config > INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 > NAME HOST STATUS REFRESHED AGE > VERSION IMAGE NAME IMAGE ID CONTAINER ID > alertmanager.gedasvl02 gedasvl02 running (6h) 7m ago 4M > 0.20.0 docker.io/prom/alertmanager:v0.20.0 0881eb8f169f 5b80fb977a5f > crash.gedaopl01 gedaopl01 stopped 7m ago 4M > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 810cf432b6d6 > crash.gedaopl02 gedaopl02 running (5h) 7m ago 4M > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 34ab264fd5ed > crash.gedaopl03 gedaopl03 running (2d) 7m ago 2d > 15.2.9 docker.io/ceph/ceph:v15 dfc483079636 233f30086d2d > crash.gedasvl02 gedasvl02 running (6h) 7m ago 4M > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 ea3d3e7c4f58 > grafana.gedasvl02 gedasvl02 running (6h) 7m ago 4M > 6.6.2 docker.io/ceph/ceph-grafana:6.6.2 a0dce381714a 5a94f3e41c32 > mds.cephfs.gedaopl01.zjuhem gedaopl01 stopped 7m ago 3M > <unknown> docker.io/ceph/ceph:v15 <unknown> <unknown> > mds.cephfs.gedasvl02.xsjtpi gedasvl02 running (6h) 7m ago 3M > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 26e7c8759d89 > mgr.gedaopl03.zilwbl gedaopl03 running (7h) 7m ago 7h > 15.2.9 docker.io/ceph/ceph:v15 dfc483079636 e18b6f40871c > mon.gedaopl03 gedaopl03 running (7h) 7m ago 7h > 15.2.9 docker.io/ceph/ceph:v15 dfc483079636 5afdf40e41ba > mon.gedasvl02 gedasvl02 running (6h) 7m ago 4M > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 e83dfcd864aa > node-exporter.gedaopl01 gedaopl01 error 7m ago 4M > 0.18.1 docker.io/prom/node-exporter:v0.18.1 e5a616e4b9cf 0fefcfcc9639 > node-exporter.gedaopl02 gedaopl02 running (5h) 7m ago 4M > 0.18.1 docker.io/prom/node-exporter:v0.18.1 e5a616e4b9cf f459045b7e41 > node-exporter.gedaopl03 gedaopl03 running (2d) 7m ago 2d > 0.18.1 docker.io/prom/node-exporter:v0.18.1 e5a616e4b9cf 3bd9f8dd6d5b > node-exporter.gedasvl02 gedasvl02 running (6h) 7m ago 4M > 0.18.1 docker.io/prom/node-exporter:v0.18.1 e5a616e4b9cf 72e96963261e > *osd.0 gedaopl01 running (5h) 7m ago 5h > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 ed76fafb1988** > **osd.0 gedaopl02 error 7m ago 4M > <unknown> docker.io/ceph/ceph:v15 <unknown> <unknown>* > osd.1 gedaopl01 running (4h) 7m ago 3d > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 41a43733e601 > *osd.2 gedaopl01 stopped 7m ago 4M > <unknown> docker.io/ceph/ceph:v15 <unknown> <unknown>** > **osd.2 gedaopl03 running (7h) 7m ago 7h > 15.2.9 docker.io/ceph/ceph:v15 dfc483079636 ac9e660db2fb* > osd.3 gedaopl03 running (7h) 7m ago 7h > 15.2.9 docker.io/ceph/ceph:v15 dfc483079636 bde17b5bb2fb > osd.4 gedaopl02 running (5h) 7m ago 3d > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 7cc3ef7c4469 > osd.5 gedaopl02 running (5h) 7m ago 3d > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 761b96d235e4 > osd.6 gedaopl02 running (5h) 7m ago 3d > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 d047b28fe2bd > osd.7 gedaopl03 running (7h) 7m ago 7h > 15.2.9 docker.io/ceph/ceph:v15 dfc483079636 3b54b01841f4 > osd.8 gedaopl01 running (5h) 7m ago 5h > 15.2.5 docker.io/ceph/ceph:v15 4405f6339e35 cdd308cdc82b > prometheus.gedasvl02 gedasvl02 running (5h) 7m ago 4M > 2.18.1 docker.io/prom/prometheus:v2.18.1 de242295e225 591cef3bbaa4 > > Is there a way to clean / purge the stopped and error ones? > > I don't know what is wrong with the node-exporter. Because looking at > podman ps -a on gedaopl01 looks ok. Maybe also a zombie daemon? > > [root@gedaopl01 ~]# podman ps -a > CONTAINER ID IMAGE COMMAND CREATED > STATUS PORTS NAMES > e71898f7d038 docker.io/prom/node-exporter:v0.18.1 --no-collector.ti... > 54 seconds ago Up 54 seconds ago > ceph-d0920c36-2368-11eb-a5de-005056b703af-node-exporter.gedaopl01 > 41a43733e601 docker.io/ceph/ceph:v15 -n osd.1 -f > --set... 5 hours ago Up 5 hours ago > ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.1 > 810cf432b6d6 docker.io/ceph/ceph:v15 -n > client.crash.g... 6 hours ago Up 6 hours ago > ceph-d0920c36-2368-11eb-a5de-005056b703af-crash.gedaopl01 > cdd308cdc82b docker.io/ceph/ceph:v15 -n osd.8 -f > --set... 6 hours ago Up 6 hours ago > ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.8 > ed76fafb1988 docker.io/ceph/ceph:v15 -n osd.0 -f > --set... 6 hours ago Up 6 hours ago > ceph-d0920c36-2368-11eb-a5de-005056b703af-osd.0 > > I replaced the very old disks with some brand new SAMSUMG PM883 and > would like to upgrade to 15.2.9. But the upgrade guide recommends to do > this on a healthy cluster only. :) > > Cheers, > > Oliver > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx