Failed monitors

Kenneth.Waegeman@xxxxxxxx (Kenneth Waegeman) · Thu, 17 Jul 2014 08:33:19 +0200

----- Message from george.ryall at stfc.ac.uk ---------
    Date: Wed, 16 Jul 2014 14:45:35 +0000
    From: george.ryall at stfc.ac.uk
Subject: Re: Failed monitors
      To: ceph-users at lists.ceph.com

> This now appears to have partially fixed itself. I am now able to  
> run commands on the cluster, though one of the monitors is down. I  
> still have no idea what was going on.

Hi George,

What do the logs /var/log/ceph/ceph-mon.*.log say?

Kenneth
>
>
> George
>
> From: george.ryall at stfc.ac.uk [mailto:george.ryall at stfc.ac.uk]
> Sent: 16 July 2014 13:59
> To: ceph-users at lists.ceph.com
> Subject: [ceph-users] Failed monitors
>
> On Friday I managed to run a command I probably shouldn't and knock  
> half our OSDs offline. By setting the noout and nodown flags and  
> bringing up the OSDS on the boxes that don't also have mons running  
> on them I got most of the cluster back up by today (it took me a  
> while to discover the nodown flag). However along the way I had to  
> restart the mon service a few times and  in two cases the monitors  
> didn't seem to be allowed to re-join the cluster and I reinstalled  
> the monitor service on them manually. Then this morning I am getting  
> the error message I associate with the mons being down whenever I  
> try and run commands on the cluster. However, restarting the mon  
> service on the three machines acting as monitors does not appear to  
> help.
>
> The message I get is:
> 2014-07-16 13:33:11.389331 7f6ba845b700  0 --  
> 130.246.179.122:0/1015725 >> 130.246.179.181:6789/0  
> pipe(0x7f6b98005f20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6b980097d0).fault
>
> What else can I try to bring the cluster back? What logs would it be  
> useful for me to look at? Have I missed something?
>
>
> George Ryall
>
> Scientific Computing | STFC Rutherford Appleton Laboratory | Harwell  
> Oxford | Didcot | OX11 0QX
> (01235 44) 5021
>
>
>
> --
> Scanned by iCritical.
>
>
> --
> Scanned by iCritical.

----- End message from george.ryall at stfc.ac.uk -----

-- 

Met vriendelijke groeten,
Kenneth Waegeman