Re: Automatically timing out/removing dead hosts?

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 20 Jan 2015 07:01:52 -0800

On Tue, Jan 20, 2015 at 1:32 AM, Christopher Armstrong
<chris@xxxxxxxxxxxx> wrote:
> Hi folks,
>
> We have many users who run Deis on AWS, and our default configuration places
> hosts in an autoscaling group. Ceph runs on all hosts in the cluster
> (monitors and OSDs), and users have reported losing quorum after having
> several autoscaling events (new nodes getting added, old nodes terminated).
> This makes sense, as the monitor map is piled up with old entries, and
> eventually Ceph thinks there aren't enough healthy monitors to maintain
> quorum.
>
> I know that permanently losing hosts fairly frequently is likely uncommon
> for most Ceph users, but I was hoping someone might have some ideas as to
> how best to combat this. I've been thinking about this in
> https://github.com/deis/deis/issues/2877, and I've come to the conclusion
> that the clearest path forward right now is to build a service which
> interacts with the Ceph API and keeps an eye on quorum state. When losing
> another host would prevent Ceph from achieving quorum, the service would
> remove a monitor from the monitor map and mark its OSD as out, ensure the
> placement groups are replicated elsewhere (we have min_size = 3 so the dead
> OSD has nothing we need), and then remove the OSD.

More or less. You might want to be more aggressive than a single
failure. *shrug*

>
> Is this the best solution? Are there any configuration options I've missed
> that we could use in the interim, such as automatically considering a
> monitor/OSD as dead if it's been offline for a certain amount of time?

This already happens to the OSDs, which are automatically marked out
(but never automatically deleted from the map) ~5 minutes after they
are deleted from the map. See "osd down out interval", I think?

We don't ever auto-kill monitors because they're special and whatnot.

Can't you hook into some system that tells you when nodes are gone and
use that to do this, instead of waiting for timeouts?
-Greg

>
> Any tips are helpful.
>
> Thanks!
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com