Hi, On Tue, May 14, 2002 at 03:03:20PM +1000, Neil Brown wrote: > On Monday May 13, sct@redhat.com wrote: > > > > The second case that can cause this is the one you are seeing, > > involving block device IO in parallel with filesystem IO. That one > > has always been there for block write IO (and in fact it's arguably > > right for ext3 to oops if out-of-band writes are causing ext3's own > > metadata to be written out-of-order), but as of 2.4.11, we now have > > the page-cache/buffer-cache aliasing interactions which can cause ext3 > > to see locked buffers even if you are only reading from the buffered > > block device. > > I had a bit of a look at the block-device-reading code and it seems to > me that there could be no (direct) interaction between this code that > the code ext3 used to access inodes. Oh, there is. > When reading a block device the page cache is used and anonymous > buffer_heads get allocated for each page. These have to be competely > separate from the buffer_heads allocated for the buffer cache that ext3 > uses. So a read on the block device could never make an ext3 buffer > locked. Actually, they are not separate: the bh'es DO get hashed. As of 2.4.11, what happens is this: ext3 calls getblk() to access a bh for some metadata; getblk() calls grow_buffers() to create the new bh; grow_buffers() calls: grow_dev_page() to create the page and hash_page_buffers() to put the bh'es on the buffer hashes. ext3 starts IO: the bh is locked, but the page is not. Then we get a page cache read: find the page and call block_read_full_page(); block_read_full_page() sees that the page is not uptodate, so we need to do IO; and the page is not locked, so it is OK to start the IO! block_read_full_page(): checks the bh uptodate bit: it is not uptodate, because ext3 is still reading it; locks the bh (implicitly waiting for ext3's IO to complete) submits the bh for read, despite the fact that the IO is now complete and the bh is now uptodate. and at that point, (a) any modifications ext3 has made to the bh are lost, and (b) the bh is unexpectedly locked, which can trip the ext3 assert failure. The ext3 patch includes a small chunk to fs/buffers.c to fix the core VFS problem: block_read_full_page MUST test the bh uptodate state again after locking the bh. Cheers, Stephen