time out of sync after power failure

clewis@xxxxxxxxxxxxxxxxxx (Craig Lewis) · Fri, 26 Sep 2014 15:00:26 -0700

First, make sure you're running ntpd on all of the nodes.  I prefer to
configure ntp to set the time on boot, then track the time.  Just running
ntpd will help, the set on boot isn't required.

The few times I've gotten a PG stuck in peering for any length of time, I
restarted the primary OSD for those PGs.  That solved my problems.

Just make sure the monitors have quorum and time sync before you start
doing this.  It probably won't help if the monitors aren't happy.

On Wed, Sep 24, 2014 at 5:04 AM, Pavel V. Kaygorodov <pasha at inasan.ru>
wrote:

> Hi!
>
> We have experienced some problems with power supply and whole our ceph
> cluster was rebooted several times.
> After a reboot the clocks on different monitor nodes becomes slightly
> desynchronized and ceph won't go up before time sync.
> But even after a time sync the ceph cluster also shows that about a half
> (typically, sometimes more, sometimes less) of pgs are in peering state for
> several hours and ceph clients don't have an access to the data.
> I have tried to speedup the process manually restarting monitors and osds,
> sometimes with success, sometimes without.
>
> Is there a way to speedup cluster repair after a global reboot?
>
> Thanks in advance,
>   Pavel.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140926/b9adf27a/attachment.htm>