Re: Bug #1047 reproduced

Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> · Fri, 27 Jan 2012 10:54:27 -0800



On Fri, Jan 27, 2012 at 6:50 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Fri, 27 Jan 2012, Amon Ott wrote:
>> On Thursday 29 December 2011 wrote Amon Ott:
>> > I finally got the test cluster freed up for more Ceph testing.
>> >
>> > On Friday 23 December 2011 wrote Gregory Farnum:
>> > > Unfortunately there's not enough info in this log either. If you can
>> > > reproduce it with "mds debug = 20" and put that log somewhere, it
>> > > ought to be enough to work out what's going on, though. Sorry. :(
>> > > -Greg
>> >
>> > Here is what MDS logs with debug 20. No idea if it really helps. The
>> > cluster is still in the broken state, should I try to reproduce with a
>> > recreated ceph fs and debug 20? This could be GBs of logs.
>>
>> Update: I recreated the Ceph FS with release 0.40. It broke only because of a
>> btrfs bug hitting two of the four nodes (after ca. one day of heavy load) and
>> recovered without problems when the nodes were back. Then I recreated with
>> ext4 as osd storage area and have not managed to break it within four days,
>> two of these under heavy load.
>>
>> This means that this bug is probably fixed. It might be related to the
>> automatic reconnect of mds, which avoids meta data inconsistencies. :)
>
> Yeah, I suspect that the problem is related to the MDS journal replay and
> the two-phase-commit stuff going on with the anchor table updates.  I
> think we should keep this open until we can do MDS restart thrashing
> against a heavy link workload.
>
> Unless there was something you found/fixed before, Greg?

Unfortunately not — like I reported in the bug, there are definitely
multiple anchor destroy updates getting into the journal somehow, but
I was unable to figure out how it might have happened. I'm not sure I
considered synchronization bugs in MDS restart or client replay,
though...
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html