Re: ceph status not showing correct monitor services

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have two approaches in mind, first one (and preferred) would be to edit the mon spec to first remove mon.a001s016 and have a clean state. Get the current spec with:

ceph orch ls mon --export > mon-edit.yaml

Edit the spec file so that mon.a001s016 is not part of it, then apply:

ceph orch apply -i mon-edit.yaml

This should remove the mon.a001s016 daemon. Then wait a few minutes or so (until the daemon is actually gone, check locally on the node with 'cephadm ls' and in /var/lib/ceph/<FSID>/removed) and add it back to the spec file, then apply again. I would expect a third MON to be deployed. If that doesn't work for some reason you'll need to inspect logs to find the root cause.

The second approach would be to remove and add the daemon manually:

ceph orch daemon rm mon.a001s016

Wait until it's really gone, then add it:

ceph orch daemon add mon a001s016

Not entirely sure about the daemon add mon command, you might need to provide something else, I'm typing this by heart.

Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>:

Hi Eugen,

Yes that is it. OSDs were restarted since mon a001s017 was reporting is low on available space. How to update the mon map to add mon.a001s016 as it is already online? And how to update mgr map to include standby mgr.a001s018 as it is also running.


ceph mon dump
dumped monmap epoch 6
epoch 6
fsid 604d56db-2fab-45db-a9ea-c418f9a8cca8
last_changed 2024-03-31T23:54:18.692983+0000
created 2021-09-30T16:15:12.884602+0000
min_mon_release 16 (pacific)
election_strategy: 1
0: [v2:10.45.128.28:3300/0,v1:10.45.128.28:6789/0] mon.a001s018
1: [v2:10.45.128.27:3300/0,v1:10.45.128.27:6789/0] mon.a001s017


Thank you,
Anantha

-----Original Message-----
From: Eugen Block <eblock@xxxxxx>
Sent: Monday, April 1, 2024 1:10 PM
To: ceph-users@xxxxxxx
Subject:  Re: ceph status not showing correct monitor services

Maybe it’s just not in the monmap? Can you show the output of:

ceph mon dump

Did you do any maintenance (apparently OSDs restarted recently) and maybe accidentally removed a MON from the monmap?


Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>:

Hi Anthony,

Seeing it since last after noon.  It is same with mgr services as ,
"ceph -s" is reporting only TWO instead of THREE

Also  mon and mgr shows " is_active: false" see below.

# ceph orch ps --daemon_type=mgr
NAME                 HOST      PORTS   STATUS         REFRESHED  AGE
 MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
mgr.a001s016.ctmoay  a001s016  *:8443  running (18M)     3m ago  23M
    206M        -  16.2.5   6e73176320aa  169cafcbbb99
mgr.a001s017.bpygfm  a001s017  *:8443  running (19M)     3m ago  23M
    332M        -  16.2.5   6e73176320aa  97257195158c
mgr.a001s018.hcxnef  a001s018  *:8443  running (20M)     3m ago  23M
    113M        -  16.2.5   6e73176320aa  21ba5896cee2

# ceph orch ls --service_name=mgr
NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
mgr              3/3  3m ago     23M  a001s016;a001s017;a001s018;count:3


# ceph orch ps --daemon_type=mon --format=json-pretty

[
  {
    "container_id": "8484a912f96a",
    "container_image_digests": [

docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586<mailto:docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586>
    ],
    "container_image_id":
"6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
    "container_image_name":
docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586<mailto:docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586>,
    "created": "2024-03-31T23:55:16.164155Z",
    "daemon_id": "a001s016",
    "daemon_type": "mon",
    "hostname": "a001s016",
    "is_active": false,
   <== why is it false
    "last_refresh": "2024-04-01T19:38:30.929014Z",
    "memory_request": 2147483648,
    "memory_usage": 761685606,
    "ports": [],
    "service_name": "mon",
    "started": "2024-03-31T23:55:16.268266Z",
    "status": 1,
    "status_desc": "running",
    "version": "16.2.5"
  },


Thank you,
Anantha

From: Anthony D'Atri <aad@xxxxxxxxxxxxxx>
Sent: Monday, April 1, 2024 12:25 PM
To: Adiga, Anantha <anantha.adiga@xxxxxxxxx>
Cc: ceph-users@xxxxxxx
Subject: Re:  ceph status not showing correct monitor
services




 a001s017.bpygfm(active, since 13M), standbys: a001s016.ctmoay

Looks like you just had an mgr failover?  Could be that the secondary
mgr hasn't caught up with current events.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux