Re: Gracefully reboot OSD node

Wido den Hollander <wido@xxxxxxxx> · Thu, 3 Aug 2017 14:20:06 +0200 (CEST)

> Op 3 augustus 2017 om 14:14 schreef Hans van den Bogert <hansbogert@xxxxxxxxx>:
> 
> 
> Thanks for answering even before I asked the questions:)
> 
> So bottom line, HEALTH_ERR state is simply part of taking a (bunch of) OSD
> down?  Is HEALTH_ERR period of 2-4 seconds within normal bounds? For
> context, CPUs are 2609v3 per 4 OSDs. (I know; they're far from the fastest
> CPUs)
> 

Yes. Prior to Jewel Ceph wouldn't go to ERR if PGs were inactive, where peering or down is a inactive state. It would just stay in WARN which implies nothing is really wrong.

You can influence the behavior with mon_pg_min_inactive. It's set to 1 by default, but that controls how many PGs need to be inactive before it goes to ERR. But that's merely suppressing the error.

A 1.9Ghz CPU isn't the fastest indeed. And most of the peering work is single threaded, so yes, this behavior is normal. If you would have faster CPUs you could reduce this time.

Still, 2 to 4 seconds isn't that bad.

Wido

> On Thu, Aug 3, 2017 at 1:55 PM, Hans van den Bogert <hansbogert@xxxxxxxxx>
> wrote:
> 
> > What are the implications of this? Because I can see a lot of blocked
> > requests piling up when using 'noout' and 'nodown'. That probably makes
> > sense though.
> > Another thing, no when the OSDs come back online, I again see multiple
> > periods of HEALTH_ERR state. Is that to be expected?
> >
> > On Thu, Aug 3, 2017 at 1:36 PM, linghucongsong <linghucongsong@xxxxxxx>
> > wrote:
> >
> >>
> >>
> >> set the osd noout nodown
> >>
> >>
> >>
> >>
> >> At 2017-08-03 18:29:47, "Hans van den Bogert" <hansbogert@xxxxxxxxx>
> >> wrote:
> >>
> >> Hi all,
> >>
> >> One thing which has bothered since the beginning of using ceph is that a
> >> reboot of a single OSD causes a HEALTH_ERR state for the cluster for at
> >> least a couple of seconds.
> >>
> >> In the case of planned reboot of a OSD node, should I do some extra
> >> commands in order not to go to HEALTH_ERR state?
> >>
> >> Thanks,
> >>
> >> Hans
> >>
> >>
> >>
> >>
> >>
> >
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com