Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

neilb@cse.unsw.edu.au (Neil Brown) · Mon, 27 May 2002 14:23:57 +1000 (EST)

On Saturday May 25, neilb@cse.unsw.edu.au wrote:
> 
> I spent lots of Friday pouring over the code, and most of it looks
> right, as one would expect.
> The only path that I couldn't convince myself was right is when 
> journal_unmap_buffer finds that the buffer it is unmapping is
> on the committing transaction.  It seems as though this buffer would
> stay dirty and could eventually be flushed out, but there could well
> be something that I am missing.
> 
> I might put a printk in here and boot into a 2.4.19-pre plus 0.9.18
> based kernel on monday and see if it shows anything.

Well, I did that...

We went all weekend with data=ordered on the problematic server and
got zero messages (One of the "raid5: multiple 1 requests" on each of
the other two servers that don't seem to have the right load).

I rebooted into 2.4.18-pre8 plus ext3 0.9.18 (plus raid and nfs stuff)
plus some printks.

It came up at 12:35 and got the first
   "raid5: multiple 1 requests for sector"
at 13:26 at which time there were a burst of 13 messages 
(actually 1 as 13:26:42, 11 at 13:26:57 and 1 at 13:27:04).

I have been logging the address of every bh that got to the
		JBUFFER_TRACE(jh, "on committing transaction");
branch of journal_unmap_buffer.

With the "raid5: multiple..." messages, I was logging the addresses 
of the two bh's - the "old" (which did not get written) and the "new"
(which did).

I tried to match these bh addresses with the ones reported with
"on committing transaction", and got a very good match.
Every "old" bh (except 2) had been reported as "on committing
transaction" at 13:08 (precisely: 13:05:50 x1, 13:08:49x9,
13:10:04x1).  No "new" bh has been similarly reported.

Th "except 2" is because I use net_ratelimit to avoid flooding
kern.log (just in case) and it lost a few messages at both times.

While this isn't conclusive that it is the same buffer_head (it could
be the same piece of memory being reused: I should printk b_rsector),
it is a very strong indicator.

I'm guessing that in this branch of journal_unmap_buffer, we really
want to clear the BH_JBDDirty flag,  but I'm not willing to do that
without the OK for one of the developers....

NeilBrown

P.S. between writing and posting this I have had three more raid5:
messages that show the same behaviour.

NeilBrown