Re: rebooting nodes in a ceph cluster

David Clarke <davidc@xxxxxxxxxxxxxxx> · Fri, 20 Dec 2013 13:59:46 +1300

On 20/12/13 13:51, Sage Weil wrote:
> On Thu, 19 Dec 2013, John-Paul Robinson wrote:
>> What impact does rebooting nodes in a ceph cluster have on the health of
>> the ceph cluster?  Can it trigger rebalancing activities that then have
>> to be undone once the node comes back up?
>>
>> I have a 4 node ceph cluster each node has 11 osds.  There is a single
>> pool with redundant storage.
>>
>> If it takes 15 minutes for one of my servers to reboot is there a risk
>> that some sort of needless automatic processing will begin?
> 
> By default, we start rebalancing data after 5 minutes.  You can adjust 
> this (to, say, 15 minutes) with
> 
>  mon osd down out interval = 900
> 
> in ceph.conf.
> 
> sage
> 
>>
>> I'm assuming that the ceph cluster can go into a "not ok" state but that
>> in this particular configuration all the data is protected against the
>> single node failure and there is no place for the data to migrate too so
>> nothing "bad" will happen.
>>
>> Thanks for any feedback.

Not directly related to Ceph, but you may want to investigate kexec[0] ('kexec-tools' package in
Debian derived distributions) in order to get your machines rebooting quicker.  It essentially
re-loads the kernel as the last step of the shutdown procedure, skipping over the lengthy
BIOS/UEFI/controller firmware etc boot stages.

[0]: http://en.wikipedia.org/wiki/Kexec

-- 
David Clarke
Systems Architect
Catalyst IT
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com