Hi, On Mon, May 13, 2002 at 03:06:48PM +1000, Neil Brown wrote: > > Hi all (and developers in particular) > > I just got bitten by this Assertion. The one that starts as in the > subject, and ends with: > "!(((jh2bh(jh))->b_state & (1UL << BH_Lock)) != 0)" > Google reminds me that it was mentioned a few times earlier this year, > but I couldn't find any statement saying that it has been fixed. > I got this in a 2.4.16 kernel, though the reports I found were 2.4.18. > So my question is: has this been fixed yet? Twice. :) > What seems to trigger it for me is reading the block device file. Yep. The assertion failure can trigger in two ways. One is reproducible under normal filesystem activity, and has been fixed in 2.4 for a while --- it was a missing "goto repeat" in one special case branch which meant that that we could drop a lock and fail to re-test an important condition. The second case that can cause this is the one you are seeing, involving block device IO in parallel with filesystem IO. That one has always been there for block write IO (and in fact it's arguably right for ext3 to oops if out-of-band writes are causing ext3's own metadata to be written out-of-order), but as of 2.4.11, we now have the page-cache/buffer-cache aliasing interactions which can cause ext3 to see locked buffers even if you are only reading from the buffered block device. Current 2.4 and 2.5 don't handle that well --- in fact they can corrupt the fs when it happens (even for ext2). I posted a fix 3 or 4 weeks ago, as well as a patch which lets ext3 recover from the situation properly. It's not in the upstream kernels yet --- Al Viro raised a question over whether it's the best fix, but it's definitely the simplest one as far as I can see. Those fixes are currently all in ext3 CVS, and is part of the patch akpm just posted. I've got one more thing to sort out --- O_SYNC behaviour in data-journaled mode --- and I'll push it all to Linus and Marcelo. > I have a program that runs every 10 hours and reads all the inode > tables straight of the block device and checks the disc usage against > what is stored in the quota file. Any difference that is found is > logged. If the same difference gets logged 3 times in a row, I > correct it. > > The Assertion failure, which has now happened twice, corresponds with > running this program.... so I might not run if quite so often any > more. With the current ext3 patches, it should be safe enough (or at least safe against kernel corruption --- reading a live filesystem via the bdev is always unsafe from the application point of view because you never get a consistent view of the fs.) I saw the same thing happening with dump(8) on a live fs, and testing has shown the current patches to fix that. Cheers, Stephen