Re: Mon losing touch with OSDs

Chris Dunlop <chris@xxxxxxxxxxxx> · Sat, 9 Mar 2013 09:47:33 +1100

On Fri, Mar 08, 2013 at 02:12:40PM +1100, Chris Dunlop wrote:
> On Thu, Feb 28, 2013 at 09:00:24PM -0800, Sage Weil wrote:
>> On Fri, 1 Mar 2013, Chris Dunlop wrote:
>>> On Sat, Feb 23, 2013 at 01:02:53PM +1100, Chris Dunlop wrote:
>>>> On Fri, Feb 22, 2013 at 05:52:11PM -0800, Sage Weil wrote:
>>>>> On Sat, 23 Feb 2013, Chris Dunlop wrote:
>>>>>> On Fri, Feb 22, 2013 at 05:30:04PM -0800, Sage Weil wrote:
>>>>>>> On Sat, 23 Feb 2013, Chris Dunlop wrote:
>>>>>>>> On Fri, Feb 22, 2013 at 04:13:21PM -0800, Sage Weil wrote:
>>>>>>>>> On Sat, 23 Feb 2013, Chris Dunlop wrote:
>>>>>>>>>> On Fri, Feb 22, 2013 at 03:43:22PM -0800, Sage Weil wrote:
>>>>>>>>>>> On Sat, 23 Feb 2013, Chris Dunlop wrote:
>>>>>>>>>>>> On Fri, Feb 22, 2013 at 01:57:32PM -0800, Sage Weil wrote:
>>>>>>>>>>>>> I just looked at the logs.  I can't tell what happend to cause that 10 
>>>>>>>>>>>>> second delay.. strangely, messages were passing from 0 -> 1, but nothing 
>>>>>>>>>>>>> came back from 1 -> 0 (although 1 was queuing, if not sending, them).
>>>>>>>>>> 
>>>>>>>>>> Is there any way of telling where they were delayed, i.e. in the 1's output
>>>>>>>>>> queue or 0's input queue?
>>>>>>>>> 
>>>>>>>>> Yeah, if you bump it up to 'debug ms = 20'.  Be aware that that will 
>>>>>>>>> generate a lot of logging, though.
>>>>>>>> 
>>>>>>>> I really don't want to load the system with too much logging, but I'm happy
>>>>>>>> modifying code...  Are there specific interesting debug outputs which I can
>>>>>>>> modify so they're output under "ms = 1"?
>>>>>>> 
>>>>>>> I'm basically interested in everything in writer() and write_message(), 
>>>>>>> and reader() and read_message()...
>>>>>> 
>>>>>> Like this?
>>>>> 
>>>>> Yeah.  You could do 2 instead of 1 so you can turn it down.  I suspect 
>>>>> that this is the lions share of what debug 20 will spam to the log, but 
>>>>> hopefully the load is manageable!
>>>> 
>>>> Good idea on the '2'. I'll get that installed and wait for it to happen again.
>>> 
>>> FYI...
>>> 
>>> To avoid running out of disk space for the massive logs, I
>>> started using logrotate on the ceph logs every two hours, which
>>> does a 'service ceph reload' to re-open the log files.
>>> 
>>> In the week since doing that I haven't seen any 'slow requests'
>>> at all (the load has stayed the same as before the change),
>>> which means the issue with the osds dropping out, then the
>>> system not recovering properly, also hasn't happened.
>>> 
>>> That's a bit suspicious, no?
>> 
>> I suspect the logging itself is changing the timing.  Let's wait and see 
>> if we get lucky... 
> 
> We got "lucky"...
> 
> ceph-mon.0.log:
> 2013-03-08 03:46:44.786682 7fcc62172700  1 -- 192.168.254.132:0/20298 --> 192.168.254.133:6801/23939 -- osd_ping(ping e815 stamp 2013-03-08 03:46:44.786679) v2 -- ?+0 0x765b180 con 0x6ab6160
>   [no ping_reply logged, then later...]
> 2013-03-08 03:46:56.211993 7fcc71190700 -1 osd.0 815 heartbeat_check: no reply from osd.1 since 2013-03-08 03:46:35.986327 (cutoff 2013-03-08 03:46:36.211992)

Bugger. I just realised that the cluster had come up without
the "ms = 2" logging enabled, making these logs no more useful
than the previous ones for working out who dropped the missing
ping_reply.

Injected "ms = 2" into the osds and mons, and added it to the
config files.

Sigh. And we're waiting again...

Chris.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html