Re: reliable monitor restarts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What do the logs of the monitor service say? Increase their verbosity
and check the logs at the time of the crash. Are you doing any sort of
monitoring on the nodes such that you can forensically check what the
system was up to prior to the crash?

As others have said systemd can handle this via unit files, in fact
this is setup for you when installing ceph (at least in version 10.x /
jewel). Which version of Ceph are you running?

Also as others have stated, MON service is very reliable, and should
not be crashing, we have had zero crashes of mon service in 1.5 years
of running. Something is afoot.

Also configuration management platforms can ensure daemons remain
running as well, but this is bootstrap and suspenders with systemd.

On Sat, Oct 22, 2016 at 6:57 AM, Ruben Kerkhof <ruben@xxxxxxxxxxxxxxxx> wrote:
> On Fri, Oct 21, 2016 at 9:31 PM, Steffen Weißgerber
> <weissgerbers@xxxxxxx> wrote:
>> Hello,
>>
>> we're running a 6 node ceph cluster with 3 mons on Ubuntu (14.04.4).
>>
>> Sometimes it happen's that the mon services die and have to restarted
>> manually.
>>
>> To have reliable service restarts I normally use D.J. Bernsteins deamontools
>> on other Linux distributions. Until now I never did this on Ubuntu.
>>
>> Is there a comparable way to configure such a watcher on services on Ubuntu
>> (i.e. under systemd)?
>
> Systemd handles this for you.
> The ceph-mon unit file has:
>
> Restart=on-failure
> StartLimitInterval=30min
> StartLimitBurst=3
>
> Note that systemd only restarts it 3 times in 30 minutes. If it fails
> more often, you'll have to reset the unit.
>
> You can finetune this with drop-ins, see systemd.service(5) for details.
>
>>
>> Regards and have a nice weekend.
>>
>> Steffen
>
> Kind regards,
>
> Ruben Kerkhof
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Respectfully,

Wes Dillingham
wes_dillingham@xxxxxxxxxxx
Research Computing | Infrastructure Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 210
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux