Re: ceph progress bar stuck and 3rd manager not deploying

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I forgot to mention, the progress not updating is a seperate bug, you
can fail the mgr (ceph mgr fail ceph1a.guidwn in your example) to
resolve that. On the monitor side, I assume you deployed using labels?
If so - just remove the label from the host where the monitor did not
start, let it fully undeploy, then re-add the label, and it will
redeploy.

On Wed, Sep 8, 2021 at 7:03 AM David Orman <ormandj@xxxxxxxxxxxx> wrote:
>
> This sounds a lot like: https://tracker.ceph.com/issues/51027 which is
> fixed in https://github.com/ceph/ceph/pull/42690
>
> David
>
> On Tue, Sep 7, 2021 at 7:31 AM mabi <mabi@xxxxxxxxxxxxx> wrote:
> >
> > Hello
> >
> > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and performed a rolling reboot and now the ceph -s output is stuck somehow and the manager service is only deployed to two nodes instead of 3 nodes. Here would be the ceph -s output:
> >
> >   cluster:
> >     id:     fb48d256-f43d-11eb-9f74-7fd39d4b232a
> >     health: HEALTH_WARN
> >             OSD count 1 < osd_pool_default_size 3
> >
> >   services:
> >     mon: 2 daemons, quorum ceph1a,ceph1c (age 25m)
> >     mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu
> >     osd: 1 osds: 1 up (since 30m), 1 in (since 3w)
> >
> >   data:
> >     pools:   0 pools, 0 pgs
> >     objects: 0 objects, 0 B
> >     usage:   5.3 MiB used, 7.0 TiB / 7.0 TiB avail
> >     pgs:
> >
> >   progress:
> >     Updating crash deployment (-1 -> 6) (0s)
> >       [............................]
> >
> > Ignore the HEALTH_WARN with of the OSD count because I have not finished to deploy all 3 OSDs. But you can see that the progress bar is stuck and I have only 2 managers, the third manager does not seem to start as can be seen here:
> >
> > $ ceph orch ps|grep stopped
> > mon.ceph1b            ceph1b               stopped           4m ago   4w        -    2048M  <unknown>  <unknown>     <unknown>
> >
> > It looks like the orchestrator is stuck and does not continue it's job. Any idea how I can get it unstuck?
> >
> > Best regards,
> > Mabi
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux