Re: Gracefully reboot OSD node

Hans van den Bogert <hansbogert@xxxxxxxxx> · Thu, 3 Aug 2017 14:14:08 +0200

Thanks for answering even before I asked the questions:)

So bottom line, HEALTH_ERR state is simply part of taking a (bunch of) OSD down?  Is HEALTH_ERR period of 2-4 seconds within normal bounds? For context, CPUs are 2609v3 per 4 OSDs. (I know; they're far from the fastest CPUs)

On Thu, Aug 3, 2017 at 1:55 PM, Hans van den Bogert <hansbogert@xxxxxxxxx> wrote:
What are the implications of this? Because I can see a lot of blocked requests piling up when using 'noout' and 'nodown'. That probably makes sense though.
Another thing, no when the OSDs come back online, I again see multiple periods of HEALTH_ERR state. Is that to be expected?

On Thu, Aug 3, 2017 at 1:36 PM, linghucongsong <linghucongsong@xxxxxxx> wrote:

set the osd noout nodown

At 2017-08-03 18:29:47, "Hans van den Bogert" <hansbogert@xxxxxxxxx> wrote:
 Hi all, 

One thing which has bothered since the beginning of using ceph is that a reboot of a single OSD causes a HEALTH_ERR state for the cluster for at least a couple of seconds. 

In the case of planned reboot of a OSD node, should I do some extra commands in order not to go to HEALTH_ERR state? 

Thanks,

Hans

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com