On Wed, 3 Apr 2013, Jim Schutt wrote: > On 04/03/2013 11:49 AM, Gregory Farnum wrote: > > On Wed, Apr 3, 2013 at 10:14 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > >> On Wed, Apr 3, 2013 at 10:09 AM, Jim Schutt <jaschut@xxxxxxxxxx> wrote: > >>> Hi Sage, > >>> > >>> On 04/03/2013 09:58 AM, Sage Weil wrote: > >>>> Hi Jim, > >>>> > >>>> What happens if you change 'osd mon ack timeout = 300' (from the > >>>> default of 30)? I suspect part of the problem is that the mons are just > >>>> slow enough that the osd's resend the same thing again and it snowballs > >>>> into more work for the monitor. > >>> > >>> Thanks, that helped. My OSDs aren't reconnecting to the mon any more, > >>> and the new filesystem started up as expected. > >>> > >>> Hmmm, it occurs to me that I upgraded my mon hosts to 10 GbE NICs at > >>> about the same time I started testing v0.59. Perhaps before the upgrade > >>> I was running right at the edge of that timeout. After the NIC upgrade > >>> the PGStat messages come flooding in at startup, and they bunch up > >>> enough that working through the backlog pushed me over the timeout cliff? > >>> > >>> Is there any downside to using a large 'osd mon ack timeout', assuming I > >>> run more than one mon? If so, I expect I'll work my way back from > >>> 'osd mon ack timeout = 300' to see how big it needs to be to stay reliable > >>> for my configuration. > >> > >> It's a timeout, so the generic downsides to larger timeouts ? if the > >> monitor actually has gone away it's going to take the OSDs more time > >> to connect to somebody else for their updates and reports. This will > >> probably be most apparent if they're trying to peer and can't make > >> progress until they get acks from the monitors, but the one they're > >> connected to has died. > >> > >> > >>> Sorry for the noise about paxos. At least it was useful > >>> to help Joao find that debug log message that was more expensive > >>> than expected.... > >> > >> It's not noise ? the reason this timeout is causing problems now is > >> that the monitor disk commits are taking so long that it looks like > >> they've failed. Which is bad. :/ So thanks for reporting it! > > > > Sorry, guess I forgot some of the history since this piece at least is > > resolved now. I'm surprised if 30-second timeouts are causing issues > > without those overloads you were seeing; have you seen this issue > > without your high debugging levels and without the bad PG commits (due > > to debugging)? > > I think so, because that's why I started with higher debugging > levels. > > But, as it turns out, I'm just in the process of returning to my > testing of next, with all my debugging back to 0. So, I'll try > the default timeout of 30 seconds first. If I have trouble starting > up a new file system, I'll turn up the timeout and try again, without > any extra debugging. Either way, I'll let you know what happens. I would be curious to hear roughly what value between 30 and 300 is sufficient, if you can experiment just a bit. We probably want to adjust the default. Perhaps more importantly, we'll need to look at the performance of the pg stat updates on the mon. There is a refactor due in that code that should improve life, but it's slated for dumpling. sage > > -- Jim > > > -Greg > > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html