Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

neilb@cse.unsw.edu.au (Neil Brown) · Sat, 25 May 2002 10:40:16 +1000 (EST)

On Friday May 24, sct@redhat.com wrote:
> Hi,
> 
> On Thu, May 23, 2002 at 11:32:43PM +0100, Stephen C. Tweedie wrote:
>  
> > I'll get that going on a highmem smp box tomorrow to see if that
> > triggers anything, both on a single disk and layered on top of a raid5
> > array.
> 
> I've got it running on a setup with data=journal, soft raid5 on an
> external scsi disk (not raid1, but you weren't seeing the dup-IO
> warnings on the raid1 journal).  No problems visible so far --- this
> is mainline 2.4.19-pre8 with no changes except for the testdrive
> debugging code.

Thanks....

I'm beginning to think that it really requires 500 students doing
random things to generate the required test load :-)

I switched back to the 2.4.17-pre2 kernel and I still got a couple of
directory corruptions, so apparently it isn't that closely related to
kernel version.  Possibly the appearance of the problem is due to
increase or change in load characteristics - we get getting into the
busy period for students doing assignments.

I have now switched to data=ordered and haven't had any raid5: warning
messages in 40 hours which I think is significant, though it could be
that the reduction in throughput that this imposes could reduce the
likelyhood of problems.

I think it is probably relevant that all accesses are coming via NFSv2
exported sync,no_wdelay, so all writes are O_SYNC and there are lots
of little transactions being committed, rather than fewer large ones.
Also the journal is very big (3Gig) so that it should never fill up
and force checkpointing, rather bdflush should get all the data out
before space in the journal needs to be reused.

I have been pursuing the possibility that a file gets deleted, the
block on disc reused, but a buffer on that file remains dirty and
eventually gets written out after it should be dead.

I spent lots of Friday pouring over the code, and most of it looks
right, as one would expect.
The only path that I couldn't convince myself was right is when 
journal_unmap_buffer finds that the buffer it is unmapping is
on the committing transaction.  It seems as though this buffer would
stay dirty and could eventually be flushed out, but there could well
be something that I am missing.

I might put a printk in here and boot into a 2.4.19-pre plus 0.9.18
based kernel on monday and see if it shows anything.

I have two other serves with similar configurations and they get the 
"raid5: multiple 1 requests" much less often (once server 2-4 days)
and haven't had any directory corruption.
One observation that might be interesting is that I occasionally get a
spurt of a dozen or so "raid5: multiple 1 requests" messages on the
problematic server...  Maybe this is when the file that left dirty
buffers around was a larger file.

NeilBrown