Re: ceph status not showing correct monitor services

"Adiga, Anantha" <anantha.adiga@xxxxxxxxx> · Mon, 1 Apr 2024 21:01:14 +0000

Thank you. I will try the  export and import method first.

Thank you,
Anantha

-----Original Message-----
From: Eugen Block <eblock@xxxxxx> 
Sent: Monday, April 1, 2024 1:57 PM
To: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
Cc: ceph-users@xxxxxxx
Subject: Re:  Re: ceph status not showing correct monitor services

I have two approaches in mind, first one (and preferred) would be to edit the mon spec to first remove mon.a001s016 and have a clean state.  
Get the current spec with:

ceph orch ls mon --export > mon-edit.yaml

Edit the spec file so that mon.a001s016 is not part of it, then apply:

ceph orch apply -i mon-edit.yaml

This should remove the mon.a001s016 daemon. Then wait a few minutes or so (until the daemon is actually gone, check locally on the node with 'cephadm ls' and in /var/lib/ceph/<FSID>/removed) and add it back to the spec file, then apply again. I would expect a third MON to be deployed. If that doesn't work for some reason you'll need to inspect logs to find the root cause.

The second approach would be to remove and add the daemon manually:

ceph orch daemon rm mon.a001s016

Wait until it's really gone, then add it:

ceph orch daemon add mon a001s016

Not entirely sure about the daemon add mon command, you might need to provide something else, I'm typing this by heart.

Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>:

> Hi Eugen,
>
> Yes that is it. OSDs were restarted since mon a001s017 was reporting  
> is low on available space.  How  to update the mon map to add   
> mon.a001s016  as it is already online?
> And how to update mgr map to  include standby mgr.a001s018 as it is 
> also running.
>
>
> ceph mon dump
> dumped monmap epoch 6
> epoch 6
> fsid 604d56db-2fab-45db-a9ea-c418f9a8cca8
> last_changed 2024-03-31T23:54:18.692983+0000 created 
> 2021-09-30T16:15:12.884602+0000 min_mon_release 16 (pacific)
> election_strategy: 1
> 0: [v2:10.45.128.28:3300/0,v1:10.45.128.28:6789/0] mon.a001s018
> 1: [v2:10.45.128.27:3300/0,v1:10.45.128.27:6789/0] mon.a001s017
>
>
> Thank you,
> Anantha
>
> -----Original Message-----
> From: Eugen Block <eblock@xxxxxx>
> Sent: Monday, April 1, 2024 1:10 PM
> To: ceph-users@xxxxxxx
> Subject:  Re: ceph status not showing correct monitor 
> services
>
> Maybe it’s just not in the monmap? Can you show the output of:
>
> ceph mon dump
>
> Did you do any maintenance (apparently OSDs restarted recently) and 
> maybe accidentally removed a MON from the monmap?
>
>
> Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>:
>
>> Hi Anthony,
>>
>> Seeing it since last after noon.  It is same with mgr services as , 
>> "ceph -s" is reporting only TWO instead of THREE
>>
>> Also  mon and mgr shows " is_active: false" see below.
>>
>> # ceph orch ps --daemon_type=mgr
>> NAME                 HOST      PORTS   STATUS         REFRESHED  AGE
>>  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
>> mgr.a001s016.ctmoay  a001s016  *:8443  running (18M)     3m ago  23M
>>     206M        -  16.2.5   6e73176320aa  169cafcbbb99
>> mgr.a001s017.bpygfm  a001s017  *:8443  running (19M)     3m ago  23M
>>     332M        -  16.2.5   6e73176320aa  97257195158c
>> mgr.a001s018.hcxnef  a001s018  *:8443  running (20M)     3m ago  23M
>>     113M        -  16.2.5   6e73176320aa  21ba5896cee2
>>
>> # ceph orch ls --service_name=mgr
>> NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
>> mgr              3/3  3m ago     23M  a001s016;a001s017;a001s018;count:3
>>
>>
>> # ceph orch ps --daemon_type=mon --format=json-pretty
>>
>> [
>>   {
>>     "container_id": "8484a912f96a",
>>     "container_image_digests": [
>>
>> docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586<mailto:docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586>
>>     ],
>>     "container_image_id":
>> "6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
>>     "container_image_name":
>> docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586<mailto:docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586>,
>>     "created": "2024-03-31T23:55:16.164155Z",
>>     "daemon_id": "a001s016",
>>     "daemon_type": "mon",
>>     "hostname": "a001s016",
>>     "is_active": false,
>>    <== why is it false
>>     "last_refresh": "2024-04-01T19:38:30.929014Z",
>>     "memory_request": 2147483648,
>>     "memory_usage": 761685606,
>>     "ports": [],
>>     "service_name": "mon",
>>     "started": "2024-03-31T23:55:16.268266Z",
>>     "status": 1,
>>     "status_desc": "running",
>>     "version": "16.2.5"
>>   },
>>
>>
>> Thank you,
>> Anantha
>>
>> From: Anthony D'Atri <aad@xxxxxxxxxxxxxx>
>> Sent: Monday, April 1, 2024 12:25 PM
>> To: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
>> Cc: ceph-users@xxxxxxx
>> Subject: Re:  ceph status not showing correct monitor 
>> services
>>
>>
>>
>>
>>  a001s017.bpygfm(active, since 13M), standbys: a001s016.ctmoay
>>
>> Looks like you just had an mgr failover?  Could be that the secondary 
>> mgr hasn't caught up with current events.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
>> email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx