This sounds a lot like: https://tracker.ceph.com/issues/51027 which is fixed in https://github.com/ceph/ceph/pull/42690 David On Tue, Sep 7, 2021 at 7:31 AM mabi <mabi@xxxxxxxxxxxxx> wrote: > > Hello > > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and performed a rolling reboot and now the ceph -s output is stuck somehow and the manager service is only deployed to two nodes instead of 3 nodes. Here would be the ceph -s output: > > cluster: > id: fb48d256-f43d-11eb-9f74-7fd39d4b232a > health: HEALTH_WARN > OSD count 1 < osd_pool_default_size 3 > > services: > mon: 2 daemons, quorum ceph1a,ceph1c (age 25m) > mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu > osd: 1 osds: 1 up (since 30m), 1 in (since 3w) > > data: > pools: 0 pools, 0 pgs > objects: 0 objects, 0 B > usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail > pgs: > > progress: > Updating crash deployment (-1 -> 6) (0s) > [............................] > > Ignore the HEALTH_WARN with of the OSD count because I have not finished to deploy all 3 OSDs. But you can see that the progress bar is stuck and I have only 2 managers, the third manager does not seem to start as can be seen here: > > $ ceph orch ps|grep stopped > mon.ceph1b ceph1b stopped 4m ago 4w - 2048M <unknown> <unknown> <unknown> > > It looks like the orchestrator is stuck and does not continue it's job. Any idea how I can get it unstuck? > > Best regards, > Mabi > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx