Failed monitors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday I managed to run a command I probably shouldn't and knock half our OSDs offline. By setting the noout and nodown flags and bringing up the OSDS on the boxes that don't also have mons running on them I got most of the cluster back up by today (it took me a while to discover the nodown flag). However along the way I had to restart the mon service a few times and  in two cases the monitors didn't seem to be allowed to re-join the cluster and I reinstalled the monitor service on them manually. Then this morning I am getting the error message I associate with the mons being down whenever I try and run commands on the cluster. However, restarting the mon service on the three machines acting as monitors does not appear to help.

The message I get is:
2014-07-16 13:33:11.389331 7f6ba845b700  0 -- 130.246.179.122:0/1015725 >> 130.246.179.181:6789/0 pipe(0x7f6b98005f20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6b980097d0).fault

What else can I try to bring the cluster back? What logs would it be useful for me to look at? Have I missed something?


George Ryall

Scientific Computing | STFC Rutherford Appleton Laboratory | Harwell Oxford | Didcot | OX11 0QX
(01235 44) 5021


-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140716/e8711c64/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux