Re: ceph progress bar stuck and 3rd manager not deploying

mabi <mabi@xxxxxxxxxxxxx> · Thu, 09 Sep 2021 14:30:51 +0000

Thank you Eugen. Indeed the answer went to Spam :(

So thanks to David for his workaround, it worked like a charm. Hopefully these patches can make it into the next pacific release.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, September 9th, 2021 at 2:33 PM, Eugen Block <eblock@xxxxxx> wrote:

> You must have missed the response to your thread, I suppose:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/
>
> Zitat von mabi mabi@xxxxxxxxxxxxx:
>
> > Hello,
> >
> > A few days later the ceph status progress bar is still stuck and the
> >
> > third mon is for some unknown reason still not deploying itself as
> >
> > can be seen from the "ceph orch ls" output below:
> >
> > ceph orch ls
> >
> > NAME PORTS RUNNING REFRESHED AGE PLACEMENT
> >
> > alertmanager ?:9093,9094 1/1 3m ago 5w count:1
> >
> > crash 7/7 3m ago 5w *
> >
> > grafana ?:3000 1/1 3m ago 5w count:1
> >
> > mgr 2/2 3m ago 4w count:2;label:mgr
> >
> > mon 2/3 3m ago 16h count:3;label:mon
> >
> > node-exporter ?:9100 7/7 3m ago 5w *
> >
> > osd 1/1 3m ago - <unmanaged>
> >
> > prometheus ?:9095 1/1 3m ago 5w count:1
> >
> > Is this a bug in cephadm? and is there a workaround?
> >
> > Thanks for any hints.
> >
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> >
> > On Tuesday, September 7th, 2021 at 2:30 PM, mabi mabi@xxxxxxxxxxxxx wrote:
> >
> > > Hello
> > >
> > > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7
> > >
> > > nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's
> > >
> > > kernel and performed a rolling reboot and now the ceph -s output is
> > >
> > > stuck somehow and the manager service is only deployed to two nodes
> > >
> > > instead of 3 nodes. Here would be the ceph -s output:
> > >
> > > cluster:
> > >
> > > id: fb48d256-f43d-11eb-9f74-7fd39d4b232a
> > >
> > > health: HEALTH_WARN
> > >
> > > OSD count 1 < osd_pool_default_size 3
> > >
> > > services:
> > >
> > > mon: 2 daemons, quorum ceph1a,ceph1c (age 25m)
> > >
> > > mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu
> > >
> > > osd: 1 osds: 1 up (since 30m), 1 in (since 3w)
> > >
> > > data:
> > >
> > > pools: 0 pools, 0 pgs
> > >
> > > objects: 0 objects, 0 B
> > >
> > > usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail
> > >
> > > pgs:
> > >
> > > progress:
> > >
> > > Updating crash deployment (-1 -> 6) (0s)
> > >
> > >       [............................]
> > >
> > >
> > > Ignore the HEALTH_WARN with of the OSD count because I have not
> > >
> > > finished to deploy all 3 OSDs. But you can see that the progress
> > >
> > > bar is stuck and I have only 2 managers, the third manager does not
> > >
> > > seem to start as can be seen here:
> > >
> > > $ ceph orch ps|grep stopped
> > >
> > > mon.ceph1b ceph1b stopped 4m ago 4w - 2048M <unknown> <unknown> <unknown>
> > >
> > > It looks like the orchestrator is stuck and does not continue it's
> > >
> > > job. Any idea how I can get it unstuck?
> > >
> > > Best regards,
> > >
> > > Mabi
> >
> > ceph-users mailing list -- ceph-users@xxxxxxx
> >
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> ceph-users mailing list -- ceph-users@xxxxxxx
>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx