Re: Gracefully reboot OSD node

Wido den Hollander <wido@xxxxxxxx> · Thu, 3 Aug 2017 13:49:29 +0200 (CEST)

> Op 3 augustus 2017 om 13:36 schreef linghucongsong <linghucongsong@xxxxxxx>:
> 
> 
> 
> 
> set the osd noout nodown
> 

While noout is correct and might help in some situations, never set nodown unless you really need that. It will block I/O since you are taking down OSDs which aren't marked as down.

In Hans's case the 'problem' is that the HEALTH_ERR is correct. Since Jewel Ceph's health will go to ERR as soon as PGs are not active.

When you take down a node they will re-peer PGs and during that time no I/O can be performed on those PGs and that is a ERR state.

Peering can be done faster by having higher clocked CPUs, but there will be a short moment where I/O will block for a set of PGs.

Wido

> 
> 
> 
> At 2017-08-03 18:29:47, "Hans van den Bogert" <hansbogert@xxxxxxxxx> wrote:
> 
> Hi all,
> 
> 
> One thing which has bothered since the beginning of using ceph is that a reboot of a single OSD causes a HEALTH_ERR state for the cluster for at least a couple of seconds.
> 
> 
> 
> In the case of planned reboot of a OSD node, should I do some extra commands in order not to go to HEALTH_ERR state?
> 
> 
> Thanks,
> 
> 
> Hans
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com