On Sep 23, 2009 15:50 -0700, Curt Wohlgemuth wrote: > Sorry to reply to self, but I'm now pretty sure that I understand this > problem. (Of course this insight came mere hours after I sent this > email -- and not in the previous 4 days of staring at it.) > > It's likely the same issue fixed by > > commit 1b774f669b4b02f4d2abf2792362ab72a2e124ab > ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}() I was going to say that this sounded like a familiar problem, but you already did the leg (well, mouse) work. > In the previous case, in no-journal mode an about-to-be-freed metadata > block is marked dirty and available for writeback. The block is then > marked free, and re-used as a data block for a different inode; the > writeback takes place, corrupting the data block. > > In this case, the newly-freed block is re-used as a *metadata* block > for a different inode. Hence the same pattern we were seeing before: > eh_entries = 0, eh_max = 340. > > These inodes were left on systems from kernels without the above > patch. Accessing the files on *patched* kernels will still make the > BUG fire, hence the confusion. > > Thanks, > Curt > > > On Wed, Sep 23, 2009 at 9:27 AM, Curt Wohlgemuth <curtw@xxxxxxxxxx> wrote: > > We've been seeing sporadic inode corruption on our ext4 partitions which > > we've been trying to analyze, without much success. I'm wondering if > > anybody might have some clues as to where things might be going wrong. > > > > We find out about the corruption via a BUG firing in ext4_ext_get_blocks(): > > > > /* > > * consistent leaf must not be empty; > > * this situation is possible, though, _during_ tree modification; > > * this is why assert can't be put in ext4_ext_find_extent() > > */ > > BUG_ON(path[depth].p_ext == NULL && depth != 0); > > > > Of course, this fires long after the inode in question is corrupted. With > > some diagnostics added in front of this bug, we can find the inodes; they > > all have characteristics like this: > > > > Output from debugfs' stat command: > > > > Inode: 1195575 Type: regular Mode: 0600 Flags: 0x80000 > > Generation: 2821101782 Version: 0x00000001 > > User: 35800 Group: 5000 Size: 8400896 > > File ACL: 0 Directory ACL: 0 > > Links: 1 Blockcount: 8 > > Fragment: Address: 0 Number: 0 Size: 0 > > ctime: 0x4a9f8009 -- Thu Sep 3 01:36:25 2009 > > atime: 0x4a9f7ff7 -- Thu Sep 3 01:36:07 2009 > > mtime: 0x4a9f8009 -- Thu Sep 3 01:36:25 2009 > > EXTENTS: > > > > Note that no data blocks are printed out here. > > > > Following the actual extent tree, it always looks like this: > > > > in-inode extent header: > > eh_magic: 0xf30a > > eh_entries: 1 > > eh_max: 4 > > eh_depth: 1 > > > > in-inode extent index 0: > > ei_block: 0 > > ei_leaf_lo: 36738577 > > ei_leaf_hi: 0 > > > > leaf node header (at block 36738577): > > eh_magic: 0xf30a > > eh_entries: 0 > > eh_max: 340 > > eh_depth: 0 > > > > The i_size value of the inode will vary, from 8192 to 8400896. But the > > i_blocks value is *always* 8. > > > > The extent tree always has depth of 1 in the in-inode header, and a valid > > leaf node header; but the leaf node header always has 0 entries. This is > > what's causing the BUG above to fire. > > > > We believe the general pattern of user space calls to create these files is > > something like this: > > > > open(O_DIRECT) > > fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 8400896) > > < various writes to the file > > > fallocate(fd, 0, 0, actual_size + BLOCK_SIZE) > > ftruncate(fd, actual_size) > > > > The second fallocate() call without KEEP_SIZE allows the following > > ftruncate to actually truncate the file -- a known issue recently fixed by > > Jiaying Zhang (but her fix is not in our kernel yet). "actual_size" can be > > 0 at times. > > > > I can't think of any actions that would cause the i_size to be so large, yet > > the i_blocks always be 8. Looking at the code in > > > > ext4_ext_remove_space() > > ext4_ext_rm_leaf() > > ext4_ext_rm_idx() > > > > I don't see a way for the extent tree to take the shape above. There are no > > errors that I can see around the time the corrupted inodes are created. It > > *seems* as though the corruption is coming during truncation, but all our > > efforts to reproduce this with small test cases have so far failed. > > > > We're using a 2.6.26 code base, with most of the latest ext4 patches > > applied. > > > > Any insights/ruminations/guesses as to what might be happening are welcome. > > > > Thanks, > > Curt > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html