Re: ceph status not showing correct monitor services

Eugen Block <eblock@xxxxxx> · Mon, 01 Apr 2024 20:57:03 +0000

I have two approaches in mind, first one (and preferred) would be to  
edit the mon spec to first remove mon.a001s016 and have a clean state.  
Get the current spec with:

ceph orch ls mon --export > mon-edit.yaml

Edit the spec file so that mon.a001s016 is not part of it, then apply:

ceph orch apply -i mon-edit.yaml

This should remove the mon.a001s016 daemon. Then wait a few minutes or  
so (until the daemon is actually gone, check locally on the node with  
'cephadm ls' and in /var/lib/ceph/<FSID>/removed) and add it back to  
the spec file, then apply again. I would expect a third MON to be  
deployed. If that doesn't work for some reason you'll need to inspect  
logs to find the root cause.

The second approach would be to remove and add the daemon manually:

ceph orch daemon rm mon.a001s016

Wait until it's really gone, then add it:

ceph orch daemon add mon a001s016

Not entirely sure about the daemon add mon command, you might need to  
provide something else, I'm typing this by heart.

Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>:

Hi Eugen,

Yes that is it. OSDs were restarted since mon a001s017 was reporting  
is low on available space.  How  to update the mon map to add   
mon.a001s016  as it is already online?
And how to update mgr map to  include standby mgr.a001s018 as it is  
also running.

ceph mon dump
dumped monmap epoch 6
epoch 6
fsid 604d56db-2fab-45db-a9ea-c418f9a8cca8
last_changed 2024-03-31T23:54:18.692983+0000
created 2021-09-30T16:15:12.884602+0000
min_mon_release 16 (pacific)
election_strategy: 1
0: [v2:10.45.128.28:3300/0,v1:10.45.128.28:6789/0] mon.a001s018
1: [v2:10.45.128.27:3300/0,v1:10.45.128.27:6789/0] mon.a001s017

Thank you,
Anantha

-----Original Message-----
From: Eugen Block <eblock@xxxxxx>
Sent: Monday, April 1, 2024 1:10 PM
To: ceph-users@xxxxxxx
Subject:  Re: ceph status not showing correct monitor services

Maybe it’s just not in the monmap? Can you show the output of:

ceph mon dump

Did you do any maintenance (apparently OSDs restarted recently) and  
maybe accidentally removed a MON from the monmap?

Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>:

Hi Anthony,

Seeing it since last after noon.  It is same with mgr services as ,
"ceph -s" is reporting only TWO instead of THREE

Also  mon and mgr shows " is_active: false" see below.

# ceph orch ps --daemon_type=mgr
NAME                 HOST      PORTS   STATUS         REFRESHED  AGE
 MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
mgr.a001s016.ctmoay  a001s016  *:8443  running (18M)     3m ago  23M
    206M        -  16.2.5   6e73176320aa  169cafcbbb99
mgr.a001s017.bpygfm  a001s017  *:8443  running (19M)     3m ago  23M
    332M        -  16.2.5   6e73176320aa  97257195158c
mgr.a001s018.hcxnef  a001s018  *:8443  running (20M)     3m ago  23M
    113M        -  16.2.5   6e73176320aa  21ba5896cee2

# ceph orch ls --service_name=mgr
NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
mgr              3/3  3m ago     23M  a001s016;a001s017;a001s018;count:3

# ceph orch ps --daemon_type=mon --format=json-pretty

[
  {
    "container_id": "8484a912f96a",
    "container_image_digests": [

docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586<mailto:docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586>
    ],
    "container_image_id":
"6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
    "container_image_name":
docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586<mailto:docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586>,
    "created": "2024-03-31T23:55:16.164155Z",
    "daemon_id": "a001s016",
    "daemon_type": "mon",
    "hostname": "a001s016",
    "is_active": false,
   <== why is it false
    "last_refresh": "2024-04-01T19:38:30.929014Z",
    "memory_request": 2147483648,
    "memory_usage": 761685606,
    "ports": [],
    "service_name": "mon",
    "started": "2024-03-31T23:55:16.268266Z",
    "status": 1,
    "status_desc": "running",
    "version": "16.2.5"
  },

Thank you,
Anantha

From: Anthony D'Atri <aad@xxxxxxxxxxxxxx>
Sent: Monday, April 1, 2024 12:25 PM
To: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
Cc: ceph-users@xxxxxxx
Subject: Re:  ceph status not showing correct monitor
services

 a001s017.bpygfm(active, since 13M), standbys: a001s016.ctmoay

Looks like you just had an mgr failover?  Could be that the secondary
mgr hasn't caught up with current events.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an  
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx