Re: [PATCH] ext4: Ensure zeroout blocks have no dirty metadata

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 10, 2009 at 9:44 AM, Eric Sandeen <sandeen@xxxxxxxxxx> wrote:
> Curt Wohlgemuth wrote:
>> This fixes a bug in which new blocks returned from an extent created with
>> ext4_ext_zeroout() can have dirty metadata still associated with them.
>
> Do you have a testcase or at least end result details for this corruption?

<sigh>  I wish I had a testcase.  I do know some facts about the
workload(s) involved:

1. No journal
2. Nearly full ext2 partition, mounted as ext4 -- tune2fs used to turn
on "extent dir_index"
3. Existing non-extent based files are removed, new (extent-based)
files are created.
4. Files created with O_DIRECT, ~8MB, fallocate(KEEP_SIZE) used,
writes are generally in increasing offset order, ~64K usually.
5. Some (~1%) writes are to holes in the fallocate'd file; we're not
yet using Mingming's patches, so these writes fall back to buffered
writes.
6. Lots of processes, lots of threads.
7. End result is a block (or sometimes 2 blocks) of all zeros, at a
block-aligned offset.  This zero'ed block is *always* somewhere in a
14-block extent created from using ext4_ext_zeroout().

Lots and lots of tracing showed that these blocks were originally
dirtied metadata blocks from unlinked (hence truncated) non-extent
based files.  These indirect blocks in ext4 have their direct block
pointers turned to zero as the blocks are truncated, and these
indirect blocks are marked as dirty.  Even though bforget() is called
on these metadata blocks, bforget() won't wait on the buffer.  Waiting
on the buffer is meant to happen at allocate time, for "new" buffers.

>
>>       Signed-off-by: Curt Wohlgemuth <curtw@xxxxxxxxxx>
>> ---
>>
>> This is for the problem I reported on 23 Nov ("Bug in extent zeroout: blocks
>> not marked as new").  I'm not seeing the corruption with this fix that I was
>> seeing without it.
>>
>> diff -uprN orig/fs/ext4/extents.c new/fs/ext4/extents.c
>> --- orig/fs/ext4/extents.c    2009-12-09 15:09:25.000000000 -0800
>> +++ new/fs/ext4/extents.c     2009-12-09 15:09:37.000000000 -0800
>> @@ -2474,9 +2474,21 @@ static int ext4_ext_zeroout(struct inode
>>               submit_bio(WRITE, bio);
>>               wait_for_completion(&event);
>>
>> -             if (test_bit(BIO_UPTODATE, &bio->bi_flags))
>> +             /* On success, we need to insure all metadata associated
>
> nitpick, "ensure" I think, although I guess they're mostly synonymous
> today so do with that what you will :)

I'll let Ted decide :-)

>
> -Eric
>
>> +              * with each of these blocks is unmapped. */
>> +             if (test_bit(BIO_UPTODATE, &bio->bi_flags)) {
>> +                     sector_t block = ee_pblock;
>> +
>>                       ret = 0;
>> -             else {
>> +                     done = 0;
>> +                     while (done < len) {
>> +                             unmap_underlying_metadata(inode->i_sb->s_bdev,
>> +                                                       block);
>> +
>> +                             done++;
>> +                             block++;
>> +                     }
>> +             } else {
>>                       ret = -EIO;
>>                       break;
>>               }
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux