Re: writev data loss bug in (at least) 2.6.31 and 2.6.32pre8 x86-64

James Y Knight <foom@xxxxxxxx> · Wed, 2 Dec 2009 16:24:01 -0500

On Dec 1, 2009, at 11:03 AM, Jan Kara wrote:
> On Tue 01-12-09 15:35:59, Jan Kara wrote:
>> On Tue 01-12-09 12:42:45, Mike Galbraith wrote:
>>> I bisected it this morning.  Bisected cleanly to...
>>> 
>>> 9eaaa2d5759837402ec5eee13b2a97921808c3eb is the first bad commit
>>  OK, I've debugged it. This commit is really at fault. The problem is
>> following:
>>  When using writev, the page we copy from is not paged in (while when we
>> use ordinary write, it is paged in). This difference might be worth
>> investigation on its own (as it is likely to heavily impact performance of
>> writev) but is irrelevant for us now - we should handle this without data
>> corruption anyway. Because the source page is not available, we pass 0 as
>> the number of copied bytes to write_end and thus ext3_write_end decides to
>> truncate the file to original size. This is perfectly fine. The problem is
>> that we do this by ext3_truncate() which just frees corresponding block but
>> does not unmap buffers. So we leave mapped buffers beyond i_size (they
>> actually never were inside i_size) but the blocks they are mapped to are
>> already free. The write is then retried (after mapping the page),
>> block_write_begin() sees the buffer is mapped (although it is beyond
>> i_size) and thus it does not call ext3_get_block() anymore. So as a result,
>> data is written to a block that is no longer allocated to the file. Bummer
>> - welcome filesystem corruption.
>>  Ext4 also has this problem but delayed allocation mitigates the effect to
>> an error in accounting of blocks reserved for delayed allocation and thus
>> under normal circumstances nothing bad happens.
>>  The question is how to solve this in the cleanest way. We can call
>> vmtruncate() instead of ext3_truncate() as we used to do but Nick wants to
>> get rid of that (that's why I originally changed the code to what it is
>> now). So probably we could just manually call truncate_pagecache() instead.
>> Nick, I think your truncate calling sequence patch set needs similar fix
>> for all filesystems as well.
>  The patch below fixes the issue for me...

Thank you! I can confirm that the patch fixes the issue in my real application as well.

James--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html