Re: ceph progress bar stuck and 3rd manager not deploying

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You must have missed the response to your thread, I suppose:

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/


Zitat von mabi <mabi@xxxxxxxxxxxxx>:

Hello,

A few days later the ceph status progress bar is still stuck and the third mon is for some unknown reason still not deploying itself as can be seen from the "ceph orch ls" output below:

 ceph orch ls
NAME           PORTS        RUNNING  REFRESHED  AGE  PLACEMENT
alertmanager   ?:9093,9094      1/1  3m ago     5w   count:1
crash                           7/7  3m ago     5w   *
grafana        ?:3000           1/1  3m ago     5w   count:1
mgr                             2/2  3m ago     4w   count:2;label:mgr
mon                             2/3  3m ago     16h  count:3;label:mon
node-exporter  ?:9100           7/7  3m ago     5w   *
osd                             1/1  3m ago     -    <unmanaged>
prometheus     ?:9095           1/1  3m ago     5w   count:1

Is this a bug in cephadm? and is there a workaround?

Thanks for any hints.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, September 7th, 2021 at 2:30 PM, mabi <mabi@xxxxxxxxxxxxx> wrote:

Hello

I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and performed a rolling reboot and now the ceph -s output is stuck somehow and the manager service is only deployed to two nodes instead of 3 nodes. Here would be the ceph -s output:

cluster:

id: fb48d256-f43d-11eb-9f74-7fd39d4b232a

health: HEALTH_WARN

OSD count 1 < osd_pool_default_size 3

services:

mon: 2 daemons, quorum ceph1a,ceph1c (age 25m)

mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu

osd: 1 osds: 1 up (since 30m), 1 in (since 3w)

data:

pools: 0 pools, 0 pgs

objects: 0 objects, 0 B

usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail

pgs:

progress:

Updating crash deployment (-1 -> 6) (0s)

      [............................]


Ignore the HEALTH_WARN with of the OSD count because I have not finished to deploy all 3 OSDs. But you can see that the progress bar is stuck and I have only 2 managers, the third manager does not seem to start as can be seen here:

$ ceph orch ps|grep stopped

mon.ceph1b ceph1b stopped 4m ago 4w - 2048M <unknown> <unknown> <unknown>

It looks like the orchestrator is stuck and does not continue it's job. Any idea how I can get it unstuck?

Best regards,

Mabi
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux