Monitor issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

So at some point during the night, our monitor 1 server rebooted for so far unknown reason.  When it came back up, the clock was skewed by 6 hours.  There were no right happening when I got alerted to the issue.  ceph shows all OSD's up and in, but no op/s and 600+ blocked requests.  I logged into mon1, fixed the clock and restarted it.  Ceph status, showed all mons up, no skew, but still no op/s.  

Check the OSD logs, see cephx auth errors, which can be caused by clock skew, from ceph website.  So try to restart the one osd to check and same thing.  So I stopped mon1, figuring it would roll over to use mon2/3 and get us backup and running.

Well, the OSD weren't showing as up, so I check my ceph.conf file to see why it wasn't failing over to mon2/3 and notice it only has the ip for mon1, so update ceph.conf with the ip for mon2/3 and restart, OSD come back up and start talking again.

So right now, mon1 is offline, and I only have mon2/3 running.  Without knowing why mon1 was having issues, I don't want to start it and bring it back in, just to have the cluster freak.  At the same time, I'd like to get back to having a quorum. I'm still review the logs on mon1 to try and see if there are any errors that might point me to the issue.

In the mean time, my questions are.  Do you think it would be worth trying starting mon1 again and see what happens?  If it still has issues, will my OSD's failover to mon2/3 now that the conf is correct?  Is there any other issues that might arise from bring it back in?

The other option I could think of would be deploy a new monitor 4 and then remove the monitor 1, but I think this could lead to other issues if I am reading the docs correct on correctly.

All our PG's are active+clean, so the cluster is in a healthy state.  The only warn is from having set no scrub and no deep scrub and 1 mon being down.

Any advice would be greatly appreciated.  Sorry for the long windedness of it and scattered thought process. 

Thanks,
Curt  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux