Re: "1 hosts down" health warning?

Kyle Bader <kyle.bader@xxxxxxxxx> · Tue, 10 May 2016 12:20:18 -0700



I really like this, +1 :)

On Mon, May 9, 2016 at 9:36 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> We have a feature (mon_osd_reporter_subtree_level = host) that makes it so
> that if an entire host is down (or whatever the configured hierarchy
> level is) the osds aren't automatically marked out after 5 minutes.
>
> This is confusing on an actual cluster because you see something like
>
>             48/5661 in osds are down
>
> bit it never clears.  It's not until you look at the ceph osd tree output
> that you can see why they aren't getting marked out.
>
> It would be great if the health warning said something like
>
>             48/5661 in osds are down
>             1/142 hosts are down (accounting for 48/48 down osds)
>
> and the health detail said something like
>
>   host foo is down with 48 OSDs
>
> I think this would be pretty easy to implement given the mon is
> already doing the subtree-based checks.
>
> Thoughts? Any takers?
>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 

Kyle Bader
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html