On Thu, Dec 10, 2009 at 9:44 AM, Eric Sandeen <sandeen@xxxxxxxxxx> wrote: > Curt Wohlgemuth wrote: >> This fixes a bug in which new blocks returned from an extent created with >> ext4_ext_zeroout() can have dirty metadata still associated with them. > > Do you have a testcase or at least end result details for this corruption? <sigh> I wish I had a testcase. I do know some facts about the workload(s) involved: 1. No journal 2. Nearly full ext2 partition, mounted as ext4 -- tune2fs used to turn on "extent dir_index" 3. Existing non-extent based files are removed, new (extent-based) files are created. 4. Files created with O_DIRECT, ~8MB, fallocate(KEEP_SIZE) used, writes are generally in increasing offset order, ~64K usually. 5. Some (~1%) writes are to holes in the fallocate'd file; we're not yet using Mingming's patches, so these writes fall back to buffered writes. 6. Lots of processes, lots of threads. 7. End result is a block (or sometimes 2 blocks) of all zeros, at a block-aligned offset. This zero'ed block is *always* somewhere in a 14-block extent created from using ext4_ext_zeroout(). Lots and lots of tracing showed that these blocks were originally dirtied metadata blocks from unlinked (hence truncated) non-extent based files. These indirect blocks in ext4 have their direct block pointers turned to zero as the blocks are truncated, and these indirect blocks are marked as dirty. Even though bforget() is called on these metadata blocks, bforget() won't wait on the buffer. Waiting on the buffer is meant to happen at allocate time, for "new" buffers. > >> Signed-off-by: Curt Wohlgemuth <curtw@xxxxxxxxxx> >> --- >> >> This is for the problem I reported on 23 Nov ("Bug in extent zeroout: blocks >> not marked as new"). I'm not seeing the corruption with this fix that I was >> seeing without it. >> >> diff -uprN orig/fs/ext4/extents.c new/fs/ext4/extents.c >> --- orig/fs/ext4/extents.c 2009-12-09 15:09:25.000000000 -0800 >> +++ new/fs/ext4/extents.c 2009-12-09 15:09:37.000000000 -0800 >> @@ -2474,9 +2474,21 @@ static int ext4_ext_zeroout(struct inode >> submit_bio(WRITE, bio); >> wait_for_completion(&event); >> >> - if (test_bit(BIO_UPTODATE, &bio->bi_flags)) >> + /* On success, we need to insure all metadata associated > > nitpick, "ensure" I think, although I guess they're mostly synonymous > today so do with that what you will :) I'll let Ted decide :-) > > -Eric > >> + * with each of these blocks is unmapped. */ >> + if (test_bit(BIO_UPTODATE, &bio->bi_flags)) { >> + sector_t block = ee_pblock; >> + >> ret = 0; >> - else { >> + done = 0; >> + while (done < len) { >> + unmap_underlying_metadata(inode->i_sb->s_bdev, >> + block); >> + >> + done++; >> + block++; >> + } >> + } else { >> ret = -EIO; >> break; >> } >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html