"1 hosts down" health warning?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have a feature (mon_osd_reporter_subtree_level = host) that makes it so 
that if an entire host is down (or whatever the configured hierarchy 
level is) the osds aren't automatically marked out after 5 minutes.

This is confusing on an actual cluster because you see something like

            48/5661 in osds are down

bit it never clears.  It's not until you look at the ceph osd tree output 
that you can see why they aren't getting marked out.

It would be great if the health warning said something like

            48/5661 in osds are down
            1/142 hosts are down (accounting for 48/48 down osds)

and the health detail said something like

  host foo is down with 48 OSDs

I think this would be pretty easy to implement given the mon is 
already doing the subtree-based checks.

Thoughts? Any takers?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux