Re: Trouble getting a new file system to start, for v0.59 and newer

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 3 Apr 2013 10:49:56 -0700



On Wed, Apr 3, 2013 at 10:14 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> On Wed, Apr 3, 2013 at 10:09 AM, Jim Schutt <jaschut@xxxxxxxxxx> wrote:
>> Hi Sage,
>>
>> On 04/03/2013 09:58 AM, Sage Weil wrote:
>>> Hi Jim,
>>>
>>> What happens if you change 'osd mon ack timeout = 300' (from the
>>> default of 30)?  I suspect part of the problem is that the mons are just
>>> slow enough that the osd's resend the same thing again and it snowballs
>>> into more work for the monitor.
>>
>> Thanks, that helped.  My OSDs aren't reconnecting to the mon any more,
>> and the new filesystem started up as expected.
>>
>> Hmmm, it occurs to me that I upgraded my mon hosts to 10 GbE NICs at
>> about the same time I started testing v0.59.  Perhaps before the upgrade
>> I was running right at the edge of that timeout.  After the NIC upgrade
>> the PGStat messages come flooding in at startup, and they bunch up
>> enough that working through the backlog pushed me over the timeout cliff?
>>
>> Is there any downside to using a large 'osd mon ack timeout', assuming I
>> run more than one mon?  If so, I expect I'll work my way back from
>> 'osd mon ack timeout = 300' to see how big it needs to be to stay reliable
>> for my configuration.
>
> It's a timeout, so the generic downsides to larger timeouts — if the
> monitor actually has gone away it's going to take the OSDs more time
> to connect to somebody else for their updates and reports. This will
> probably be most apparent if they're trying to peer and can't make
> progress until they get acks from the monitors, but the one they're
> connected to has died.
>
>
>> Sorry for the noise about paxos.  At least it was useful
>> to help Joao find that debug log message that was more expensive
>> than expected....
>
> It's not noise — the reason this timeout is causing problems now is
> that the monitor disk commits are taking so long that it looks like
> they've failed. Which is bad. :/ So thanks for reporting it!

Sorry, guess I forgot some of the history since this piece at least is
resolved now. I'm surprised if 30-second timeouts are causing issues
without those overloads you were seeing; have you seen this issue
without your high debugging levels and without the bad PG commits (due
to debugging)?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html