On Sat, 23 Feb 2013, Chris Dunlop wrote: > On Fri, Feb 22, 2013 at 04:13:21PM -0800, Sage Weil wrote: > > On Sat, 23 Feb 2013, Chris Dunlop wrote: > >> On Fri, Feb 22, 2013 at 03:43:22PM -0800, Sage Weil wrote: > >>> On Sat, 23 Feb 2013, Chris Dunlop wrote: > >>>> On Fri, Feb 22, 2013 at 01:57:32PM -0800, Sage Weil wrote: > >>>>> On Fri, 22 Feb 2013, Chris Dunlop wrote: > >>>>>> G'day, > >>>>>> > >>>>>> It seems there might be two issues here: the first being the delayed > >>>>>> receipt of echo replies causing an seemingly otherwise healthy osd to be > >>>>>> marked down, the second being the lack of recovery once the downed osd is > >>>>>> recognised as up again. > >>>>>> > >>>>>> Is it worth my opening tracker reports for this, just so it doesn't get > >>>>>> lost? > >>>>> > >>>>> I just looked at the logs. I can't tell what happend to cause that 10 > >>>>> second delay.. strangely, messages were passing from 0 -> 1, but nothing > >>>>> came back from 1 -> 0 (although 1 was queuing, if not sending, them). > >> > >> Is there any way of telling where they were delayed, i.e. in the 1's output > >> queue or 0's input queue? > > > > Yeah, if you bump it up to 'debug ms = 20'. Be aware that that will > > generate a lot of logging, though. > > I really don't want to load the system with too much logging, but I'm happy > modifying code... Are there specific interesting debug outputs which I can > modify so they're output under "ms = 1"? I'm basically interested in everything in writer() and write_message(), and reader() and read_message()... sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html