Ceph Managers dieing?

Peter Childs <pchilds@xxxxxxx> · Thu, 17 Jun 2021 16:28:25 +0100

Lets try to stop this message turning into a mass moaning session about
Ceph and try and get this newbie able to use it.

I've got a Ceph Octopus cluster, its relatively new and deployed using
cephadm.

It was working fine, but now the managers start up run for about 30 seconds
and then die, until systemctl gives up and I have to reset-fail them to get
them to try again, when they fail.

How do I work out why and get them working again?

I've got 21 nodes and was looking to take it up to 32 over the next few
weeks, but that is going to be difficult if the managers are not working.

I did try Pacific and I'm happy to upgrade but that failed to deploy more
than 6 osd's and I gave up and went back to Octopus.

I'm about to give up on Ceph because it looks like its really really
"fragile" and debugging what's going wrong is really difficult.

I guess I could give up on cephadm and go with a different provisioning
method but I'm not sure where to start on that.

Thanks in advance.

Peter.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx