Hi, thank you for answering. >>> Wes Dillingham <wes_dillingham@xxxxxxxxxxx> schrieb am Montag, 24. Oktober 2016 um 17:31: > What do the logs of the monitor service say? Increase their verbosity > and check the logs at the time of the crash. Are you doing any sort of > monitoring on the nodes such that you can forensically check what the > system was up to prior to the crash? > I'll do this. In normal logging it's only logging that new election is initiated. At the moment we are in the situation, that the system disk of one monitor host is read only due to disk failure (a buggy sata dom, that we will change). So the left to monitors do the job. > As others have said systemd can handle this via unit files, in fact > this is setup for you when installing ceph (at least in version 10.x / > jewel). Which version of Ceph are you running? > Our installation started with Firefly some 2 years ago. At the moment there should be some default configuration active because we never configured something like this. Only installed system and ceph updates/upgrades. > Also as others have stated, MON service is very reliable, and should > not be crashing, we have had zero crashes of mon service in 1.5 years > of running. Something is afoot. > Yes, I fully agree. But the situation changed slightly with hammer. The monitors died sporadically when running ceph/rbd commands. This was never really problematic (more annoying). > Also configuration management platforms can ensure daemons remain > running as well, but this is bootstrap and suspenders with systemd. > I'll check what's possible with those unit files and also increase the log level to find the source of the problem. I was on vacation within the last days and will be back at the office tomorrow. Thank you for you help. Regards Steffen > On Sat, Oct 22, 2016 at 6:57 AM, Ruben Kerkhof <ruben@xxxxxxxxxxxxxxxx> wrote: >> On Fri, Oct 21, 2016 at 9:31 PM, Steffen Weißgerber >> <weissgerbers@xxxxxxx> wrote: >>> Hello, >>> >>> we're running a 6 node ceph cluster with 3 mons on Ubuntu (14.04.4). >>> >>> Sometimes it happen's that the mon services die and have to restarted >>> manually. >>> >>> To have reliable service restarts I normally use D.J. Bernsteins deamontools >>> on other Linux distributions. Until now I never did this on Ubuntu. >>> >>> Is there a comparable way to configure such a watcher on services on Ubuntu >>> (i.e. under systemd)? >> >> Systemd handles this for you. >> The ceph-mon unit file has: >> >> Restart=on-failure >> StartLimitInterval=30min >> StartLimitBurst=3 >> >> Note that systemd only restarts it 3 times in 30 minutes. If it fails >> more often, you'll have to reset the unit. >> >> You can finetune this with drop-ins, see systemd.service(5) for details. >> >>> >>> Regards and have a nice weekend. >>> >>> Steffen >> >> Kind regards, >> >> Ruben Kerkhof >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Respectfully, > > Wes Dillingham > wes_dillingham@xxxxxxxxxxx > Research Computing | Infrastructure Engineer > Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 210 -- Klinik-Service Neubrandenburg GmbH Allendestr. 30, 17036 Neubrandenburg Amtsgericht Neubrandenburg, HRB 2457 Geschaeftsfuehrerin: Gudrun Kappich _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com