I really like this, +1 :) On Mon, May 9, 2016 at 9:36 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: > We have a feature (mon_osd_reporter_subtree_level = host) that makes it so > that if an entire host is down (or whatever the configured hierarchy > level is) the osds aren't automatically marked out after 5 minutes. > > This is confusing on an actual cluster because you see something like > > 48/5661 in osds are down > > bit it never clears. It's not until you look at the ceph osd tree output > that you can see why they aren't getting marked out. > > It would be great if the health warning said something like > > 48/5661 in osds are down > 1/142 hosts are down (accounting for 48/48 down osds) > > and the health detail said something like > > host foo is down with 48 OSDs > > I think this would be pretty easy to implement given the mon is > already doing the subtree-based checks. > > Thoughts? Any takers? > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Kyle Bader -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html