We have a feature (mon_osd_reporter_subtree_level = host) that makes it so that if an entire host is down (or whatever the configured hierarchy level is) the osds aren't automatically marked out after 5 minutes. This is confusing on an actual cluster because you see something like 48/5661 in osds are down bit it never clears. It's not until you look at the ceph osd tree output that you can see why they aren't getting marked out. It would be great if the health warning said something like 48/5661 in osds are down 1/142 hosts are down (accounting for 48/48 down osds) and the health detail said something like host foo is down with 48 OSDs I think this would be pretty easy to implement given the mon is already doing the subtree-based checks. Thoughts? Any takers? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html