Re: ceph progress bar stuck and 3rd manager not deploying

Eugen Block <eblock@xxxxxx> · Thu, 09 Sep 2021 12:33:34 +0000

You must have missed the response to your thread, I suppose:

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/

Zitat von mabi <mabi@xxxxxxxxxxxxx>:

Hello,

A few days later the ceph status progress bar is still stuck and the  
third mon is for some unknown reason still not deploying itself as  
can be seen from the "ceph orch ls" output below:

 ceph orch ls
NAME           PORTS        RUNNING  REFRESHED  AGE  PLACEMENT
alertmanager   ?:9093,9094      1/1  3m ago     5w   count:1
crash                           7/7  3m ago     5w   *
grafana        ?:3000           1/1  3m ago     5w   count:1
mgr                             2/2  3m ago     4w   count:2;label:mgr
mon                             2/3  3m ago     16h  count:3;label:mon
node-exporter  ?:9100           7/7  3m ago     5w   *
osd                             1/1  3m ago     -    <unmanaged>
prometheus     ?:9095           1/1  3m ago     5w   count:1

Is this a bug in cephadm? and is there a workaround?

Thanks for any hints.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, September 7th, 2021 at 2:30 PM, mabi <mabi@xxxxxxxxxxxxx> wrote:

Hello

I have a test ceph octopus 16.2.5 cluster with cephadm out of 7  
nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's  
kernel and performed a rolling reboot and now the ceph -s output is  
stuck somehow and the manager service is only deployed to two nodes  
instead of 3 nodes. Here would be the ceph -s output:

cluster:

id: fb48d256-f43d-11eb-9f74-7fd39d4b232a

health: HEALTH_WARN

OSD count 1 < osd_pool_default_size 3

services:

mon: 2 daemons, quorum ceph1a,ceph1c (age 25m)

mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu

osd: 1 osds: 1 up (since 30m), 1 in (since 3w)

data:

pools: 0 pools, 0 pgs

objects: 0 objects, 0 B

usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail

pgs:

progress:

Updating crash deployment (-1 -> 6) (0s)

      [............................]

Ignore the HEALTH_WARN with of the OSD count because I have not  
finished to deploy all 3 OSDs. But you can see that the progress  
bar is stuck and I have only 2 managers, the third manager does not  
seem to start as can be seen here:

$ ceph orch ps|grep stopped

mon.ceph1b ceph1b stopped 4m ago 4w - 2048M <unknown> <unknown> <unknown>

It looks like the orchestrator is stuck and does not continue it's  
job. Any idea how I can get it unstuck?

Best regards,

Mabi
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx