I have two approaches in mind, first one (and preferred) would be to
edit the mon spec to first remove mon.a001s016 and have a clean state.
Get the current spec with:
ceph orch ls mon --export > mon-edit.yaml
Edit the spec file so that mon.a001s016 is not part of it, then apply:
ceph orch apply -i mon-edit.yaml
This should remove the mon.a001s016 daemon. Then wait a few minutes or
so (until the daemon is actually gone, check locally on the node with
'cephadm ls' and in /var/lib/ceph/<FSID>/removed) and add it back to
the spec file, then apply again. I would expect a third MON to be
deployed. If that doesn't work for some reason you'll need to inspect
logs to find the root cause.
The second approach would be to remove and add the daemon manually:
ceph orch daemon rm mon.a001s016
Wait until it's really gone, then add it:
ceph orch daemon add mon a001s016
Not entirely sure about the daemon add mon command, you might need to
provide something else, I'm typing this by heart.
Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>:
Hi Eugen,
Yes that is it. OSDs were restarted since mon a001s017 was reporting
is low on available space. How to update the mon map to add
mon.a001s016 as it is already online?
And how to update mgr map to include standby mgr.a001s018 as it is
also running.
ceph mon dump
dumped monmap epoch 6
epoch 6
fsid 604d56db-2fab-45db-a9ea-c418f9a8cca8
last_changed 2024-03-31T23:54:18.692983+0000
created 2021-09-30T16:15:12.884602+0000
min_mon_release 16 (pacific)
election_strategy: 1
0: [v2:10.45.128.28:3300/0,v1:10.45.128.28:6789/0] mon.a001s018
1: [v2:10.45.128.27:3300/0,v1:10.45.128.27:6789/0] mon.a001s017
Thank you,
Anantha
-----Original Message-----
From: Eugen Block <eblock@xxxxxx>
Sent: Monday, April 1, 2024 1:10 PM
To: ceph-users@xxxxxxx
Subject: Re: ceph status not showing correct monitor services
Maybe it’s just not in the monmap? Can you show the output of:
ceph mon dump
Did you do any maintenance (apparently OSDs restarted recently) and
maybe accidentally removed a MON from the monmap?
Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>:
Hi Anthony,
Seeing it since last after noon. It is same with mgr services as ,
"ceph -s" is reporting only TWO instead of THREE
Also mon and mgr shows " is_active: false" see below.
# ceph orch ps --daemon_type=mgr
NAME HOST PORTS STATUS REFRESHED AGE
MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
mgr.a001s016.ctmoay a001s016 *:8443 running (18M) 3m ago 23M
206M - 16.2.5 6e73176320aa 169cafcbbb99
mgr.a001s017.bpygfm a001s017 *:8443 running (19M) 3m ago 23M
332M - 16.2.5 6e73176320aa 97257195158c
mgr.a001s018.hcxnef a001s018 *:8443 running (20M) 3m ago 23M
113M - 16.2.5 6e73176320aa 21ba5896cee2
# ceph orch ls --service_name=mgr
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
mgr 3/3 3m ago 23M a001s016;a001s017;a001s018;count:3
# ceph orch ps --daemon_type=mon --format=json-pretty
[
{
"container_id": "8484a912f96a",
"container_image_digests": [
docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586<mailto:docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586>
],
"container_image_id":
"6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
"container_image_name":
docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586<mailto:docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586>,
"created": "2024-03-31T23:55:16.164155Z",
"daemon_id": "a001s016",
"daemon_type": "mon",
"hostname": "a001s016",
"is_active": false,
<== why is it false
"last_refresh": "2024-04-01T19:38:30.929014Z",
"memory_request": 2147483648,
"memory_usage": 761685606,
"ports": [],
"service_name": "mon",
"started": "2024-03-31T23:55:16.268266Z",
"status": 1,
"status_desc": "running",
"version": "16.2.5"
},
Thank you,
Anantha
From: Anthony D'Atri <aad@xxxxxxxxxxxxxx>
Sent: Monday, April 1, 2024 12:25 PM
To: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
Cc: ceph-users@xxxxxxx
Subject: Re: ceph status not showing correct monitor
services
a001s017.bpygfm(active, since 13M), standbys: a001s016.ctmoay
Looks like you just had an mgr failover? Could be that the secondary
mgr hasn't caught up with current events.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx