Fwd: FW: "1 hosts down" health warning?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi:

I can take this task.



From: Sage Weil <sweil@xxxxxxxxxx>
Date: Tue, May 10, 2016 at 8:11 PM
Subject: Re: "1 hosts down" health warning?
To: Wido den Hollander <wido@xxxxxxxx>
Cc: ceph-devel@xxxxxxxxxxxxxxx


On Tue, 10 May 2016, Wido den Hollander wrote:
> > Op 9 mei 2016 om 18:36 schreef Sage Weil <sweil@xxxxxxxxxx>:
> >
> >
> > We have a feature (mon_osd_reporter_subtree_level = host) that makes it so
> > that if an entire host is down (or whatever the configured hierarchy
> > level is) the osds aren't automatically marked out after 5 minutes.
> >
> > This is confusing on an actual cluster because you see something like
> >
> >             48/5661 in osds are down
> >
> > bit it never clears.  It's not until you look at the ceph osd tree output
> > that you can see why they aren't getting marked out.
> >
> > It would be great if the health warning said something like
> >
> >             48/5661 in osds are down
> >             1/142 hosts are down (accounting for 48/48 down osds)
> >
> > and the health detail said something like
> >
> >   host foo is down with 48 OSDs
> >
> > I think this would be pretty easy to implement given the mon is
> > already doing the subtree-based checks.
> >
> > Thoughts? Any takers?
>
> Seems like a good thing to have. I wouldn't say 'host', since
> 'mon_osd_reporter_subtree_level' could be set to rack or row if you want
> to.
>
> Maybe:
>
>             480/6720 in osds are down
>             1/14 of CRUSH type 'rack' are down (accounting for 480/480 down osds)

Yeah.  I was thinking it'd be

             1/14 ${type}s are down (accounting for 480/480 down osds)

e.g.,

             1/14 racks are down (accounting for 480/480 down osds)

just because concise is usually better.


sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux