> Op 25 oktober 2016 om 18:24 schreef Steffen Weißgerber <WeissgerberS@xxxxxxx>: > > > Hi, > > thank you for answering. > > > >>> Wes Dillingham <wes_dillingham@xxxxxxxxxxx> schrieb am Montag, 24. > Oktober 2016 > um 17:31: > > What do the logs of the monitor service say? Increase their > verbosity > > and check the logs at the time of the crash. Are you doing any sort > of > > monitoring on the nodes such that you can forensically check what > the > > system was up to prior to the crash? > > > > I'll do this. In normal logging it's only logging that new election is > initiated. > > At the moment we are in the situation, that the system disk of one > monitor host is read only > due to disk failure (a buggy sata dom, that we will change). > Warning! Although Monitors do not require a lot of storage nor performance they DO require RELIABLE storage. A SATADOM is NOT reliable. Sorry for the caps, but I'm trying to prevent a disaster here. Please, buy a datacenter grade SSD like the Intel S3710 or Samsung SM836 for your Monitors. If the storage underneath them starts to fail you have a serious problem. If you loose all your monitors you effectively loose your cluster. Wido > So the left to monitors do the job. > > > As others have said systemd can handle this via unit files, in fact > > this is setup for you when installing ceph (at least in version 10.x > / > > jewel). Which version of Ceph are you running? > > > > Our installation started with Firefly some 2 years ago. At the moment > there should be some > default configuration active because we never configured something like > this. Only installed > system and ceph updates/upgrades. > > > Also as others have stated, MON service is very reliable, and should > > not be crashing, we have had zero crashes of mon service in 1.5 > years > > of running. Something is afoot. > > > > Yes, I fully agree. But the situation changed slightly with hammer. The > monitors died sporadically > when running ceph/rbd commands. > > This was never really problematic (more annoying). > > > Also configuration management platforms can ensure daemons remain > > running as well, but this is bootstrap and suspenders with systemd. > > > > I'll check what's possible with those unit files and also increase the > log level to find the source of > the problem. > > I was on vacation within the last days and will be back at the office > tomorrow. > > Thank you for you help. > > Regards > > Steffen > > > On Sat, Oct 22, 2016 at 6:57 AM, Ruben Kerkhof > <ruben@xxxxxxxxxxxxxxxx> wrote: > >> On Fri, Oct 21, 2016 at 9:31 PM, Steffen Weißgerber > >> <weissgerbers@xxxxxxx> wrote: > >>> Hello, > >>> > >>> we're running a 6 node ceph cluster with 3 mons on Ubuntu > (14.04.4). > >>> > >>> Sometimes it happen's that the mon services die and have to > restarted > >>> manually. > >>> > >>> To have reliable service restarts I normally use D.J. Bernsteins > deamontools > >>> on other Linux distributions. Until now I never did this on > Ubuntu. > >>> > >>> Is there a comparable way to configure such a watcher on services > on Ubuntu > >>> (i.e. under systemd)? > >> > >> Systemd handles this for you. > >> The ceph-mon unit file has: > >> > >> Restart=on-failure > >> StartLimitInterval=30min > >> StartLimitBurst=3 > >> > >> Note that systemd only restarts it 3 times in 30 minutes. If it > fails > >> more often, you'll have to reset the unit. > >> > >> You can finetune this with drop-ins, see systemd.service(5) for > details. > >> > >>> > >>> Regards and have a nice weekend. > >>> > >>> Steffen > >> > >> Kind regards, > >> > >> Ruben Kerkhof > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > -- > > Respectfully, > > > > Wes Dillingham > > wes_dillingham@xxxxxxxxxxx > > Research Computing | Infrastructure Engineer > > Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room > 210 > > -- > Klinik-Service Neubrandenburg GmbH > Allendestr. 30, 17036 Neubrandenburg > Amtsgericht Neubrandenburg, HRB 2457 > Geschaeftsfuehrerin: Gudrun Kappich > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com