On Sun, Feb 17, 2013 at 05:44:29PM -0800, Sage Weil wrote: > On Mon, 18 Feb 2013, Chris Dunlop wrote: >> On Sat, Feb 16, 2013 at 09:05:21AM +1100, Chris Dunlop wrote: >>> On Thu, Feb 14, 2013 at 08:57:11PM -0800, Sage Weil wrote: >>>> On Fri, 15 Feb 2013, Chris Dunlop wrote: >>>>> In an otherwise seemingly healthy cluster (ceph 0.56.2), what might cause the >>>>> mons to lose touch with the osds? >>>> >>>> Can you enable 'debug ms = 1' on the mons and leave them that way, in the >>>> hopes that this happens again? It will give us more information to go on. >>> >>> Debug turned on. >> >> We haven't experienced the cluster losing touch with the osds completely >> since upgrading from 0.56.2 to 0.56.3, but we did lose touch with osd.1 >> for a few seconds before it recovered. See below for logs (reminder: 3 >> boxes, b2 is mon-only, b4 is mon+osd.0, b5 is mon+osd.1). > > Hrm, I don't see any obvious clues. You could enable 'debug ms = 1' on > the osds as well. That will give us more to go on if/when it happens > again, and should not affect performance significantly. Done: ceph osd tell '*' injectargs '--debug-ms 1' Now to wait for it to happen again. Chris -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html