Re: Automatically timing out/removing dead hosts?

Christopher Armstrong <chris@xxxxxxxxxxxx> · Tue, 20 Jan 2015 12:08:32 -0800

> Can't you hook into some system that tells you when nodes are gone and
use that to do this, instead of waiting for timeouts?
I wish we could! The AWS autoscaler will attempt to shut down instances gracefully, but not infrequently they are shut down forcefully. And there's no way I can find to tell AWS to call a script before shutting down the machine.

The only solution I can think of is an external service.

Thanks for the tip on osd down out interval - I was helping a user recover from quorum lost yesterday, and before I could tell Ceph to out a stale OSD, it did it for me. I thought I had stumbled upon some sort of black magic. Good to know that's configurable.

Chris

On Tue, Jan 20, 2015 at 7:01 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Tue, Jan 20, 2015 at 1:32 AM, Christopher Armstrong

<chris@xxxxxxxxxxxx> wrote:

> Hi folks,

>

> We have many users who run Deis on AWS, and our default configuration places

> hosts in an autoscaling group. Ceph runs on all hosts in the cluster

> (monitors and OSDs), and users have reported losing quorum after having

> several autoscaling events (new nodes getting added, old nodes terminated).

> This makes sense, as the monitor map is piled up with old entries, and

> eventually Ceph thinks there aren't enough healthy monitors to maintain

> quorum.

>

> I know that permanently losing hosts fairly frequently is likely uncommon

> for most Ceph users, but I was hoping someone might have some ideas as to

> how best to combat this. I've been thinking about this in

> https://github.com/deis/deis/issues/2877, and I've come to the conclusion

> that the clearest path forward right now is to build a service which

> interacts with the Ceph API and keeps an eye on quorum state. When losing

> another host would prevent Ceph from achieving quorum, the service would

> remove a monitor from the monitor map and mark its OSD as out, ensure the

> placement groups are replicated elsewhere (we have min_size = 3 so the dead

> OSD has nothing we need), and then remove the OSD.

More or less. You might want to be more aggressive than a single

failure. *shrug*

>

> Is this the best solution? Are there any configuration options I've missed

> that we could use in the interim, such as automatically considering a

> monitor/OSD as dead if it's been offline for a certain amount of time?

This already happens to the OSDs, which are automatically marked out

(but never automatically deleted from the map) ~5 minutes after they

are deleted from the map. See "osd down out interval", I think?

We don't ever auto-kill monitors because they're special and whatnot.

Can't you hook into some system that tells you when nodes are gone and

use that to do this, instead of waiting for timeouts?

-Greg

>

> Any tips are helpful.

>

> Thanks!

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com