Sorry to reply to self, but I'm now pretty sure that I understand this problem. (Of course this insight came mere hours after I sent this email -- and not in the previous 4 days of staring at it.) It's likely the same issue fixed by commit 1b774f669b4b02f4d2abf2792362ab72a2e124ab ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}() In the previous case, in no-journal mode an about-to-be-freed metadata block is marked dirty and available for writeback. The block is then marked free, and re-used as a data block for a different inode; the writeback takes place, corrupting the data block. In this case, the newly-freed block is re-used as a *metadata* block for a different inode. Hence the same pattern we were seeing before: eh_entries = 0, eh_max = 340. These inodes were left on systems from kernels without the above patch. Accessing the files on *patched* kernels will still make the BUG fire, hence the confusion. Thanks, Curt On Wed, Sep 23, 2009 at 9:27 AM, Curt Wohlgemuth <curtw@xxxxxxxxxx> wrote: > We've been seeing sporadic inode corruption on our ext4 partitions which > we've been trying to analyze, without much success. I'm wondering if > anybody might have some clues as to where things might be going wrong. > > We find out about the corruption via a BUG firing in ext4_ext_get_blocks(): > > /* > * consistent leaf must not be empty; > * this situation is possible, though, _during_ tree modification; > * this is why assert can't be put in ext4_ext_find_extent() > */ > BUG_ON(path[depth].p_ext == NULL && depth != 0); > > Of course, this fires long after the inode in question is corrupted. With > some diagnostics added in front of this bug, we can find the inodes; they > all have characteristics like this: > > Output from debugfs' stat command: > > Inode: 1195575 Type: regular Mode: 0600 Flags: 0x80000 > Generation: 2821101782 Version: 0x00000001 > User: 35800 Group: 5000 Size: 8400896 > File ACL: 0 Directory ACL: 0 > Links: 1 Blockcount: 8 > Fragment: Address: 0 Number: 0 Size: 0 > ctime: 0x4a9f8009 -- Thu Sep 3 01:36:25 2009 > atime: 0x4a9f7ff7 -- Thu Sep 3 01:36:07 2009 > mtime: 0x4a9f8009 -- Thu Sep 3 01:36:25 2009 > EXTENTS: > > Note that no data blocks are printed out here. > > Following the actual extent tree, it always looks like this: > > in-inode extent header: > eh_magic: 0xf30a > eh_entries: 1 > eh_max: 4 > eh_depth: 1 > > in-inode extent index 0: > ei_block: 0 > ei_leaf_lo: 36738577 > ei_leaf_hi: 0 > > leaf node header (at block 36738577): > eh_magic: 0xf30a > eh_entries: 0 > eh_max: 340 > eh_depth: 0 > > The i_size value of the inode will vary, from 8192 to 8400896. But the > i_blocks value is *always* 8. > > The extent tree always has depth of 1 in the in-inode header, and a valid > leaf node header; but the leaf node header always has 0 entries. This is > what's causing the BUG above to fire. > > We believe the general pattern of user space calls to create these files is > something like this: > > open(O_DIRECT) > fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 8400896) > < various writes to the file > > fallocate(fd, 0, 0, actual_size + BLOCK_SIZE) > ftruncate(fd, actual_size) > > The second fallocate() call without KEEP_SIZE allows the following > ftruncate to actually truncate the file -- a known issue recently fixed by > Jiaying Zhang (but her fix is not in our kernel yet). "actual_size" can be > 0 at times. > > I can't think of any actions that would cause the i_size to be so large, yet > the i_blocks always be 8. Looking at the code in > > ext4_ext_remove_space() > ext4_ext_rm_leaf() > ext4_ext_rm_idx() > > I don't see a way for the extent tree to take the shape above. There are no > errors that I can see around the time the corrupted inodes are created. It > *seems* as though the corruption is coming during truncation, but all our > efforts to reproduce this with small test cases have so far failed. > > We're using a 2.6.26 code base, with most of the latest ext4 patches > applied. > > Any insights/ruminations/guesses as to what might be happening are welcome. > > Thanks, > Curt > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html