Re: [PATCH 1/2] ocfs2: Fix data corruption on truncate

Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx> · Tue, 2 Nov 2021 10:36:42 +0800



On 11/1/21 7:31 PM, Jan Kara wrote:
> On Thu 28-10-21 15:09:08, Joseph Qi wrote:
>> Hi Jan,
>>
>> On 10/25/21 11:13 PM, Jan Kara wrote:
>>> ocfs2_truncate_file() did unmap invalidate page cache pages before
>>> zeroing partial tail cluster and setting i_size. Thus some pages could
>>> be left (and likely have left if the cluster zeroing happened) in the
>>> page cache beyond i_size after truncate finished letting user possibly
>>> see stale data once the file was extended again. Also the tail cluster
>>
>> I don't quite understand the case. 
>> truncate_inode_pages() will truncate pages from new_i_size to i_size,
>> and the following ocfs2_orphan_for_truncate() will zero range and then
>> update i_size for inode as well as dinode.
>> So once truncate finished, how stale data exposing happens? Or do you
>> mean a race case between the above two steps?
> 
> Sorry, I was not quite accurate in the above paragraph. There are several
> ways how stale pages in the pagecache can cause problems.
> 
> 1) Because i_size is reduced after truncating page cache, page fault can
> happen after truncating page cache and zeroing pages but before reducing i_size.
> This will in allow user to arbitrarily modify pages that are used for
> writing zeroes into the cluster tail and after file extension these data
> will become visible.
> 
> 2) The tail cluster zeroing in ocfs2_orphan_for_truncate() can actually try
> to write zeroed pages above i_size (e.g. if we have 4k blocksize, 64k
> clustersize, and do truncate(f, 4k) on a 4k file). This will cause exactly
> same problems as already described in commit 5314454ea3f "ocfs2: fix data
> corruption after conversion from inline format".
> 
> Hope it is clearer now.
> 
So the core reason is ocfs2_zero_range_for_truncate() grabs pages and
then zero, right?
I think an alternative way is using zeroout instead of zero pages, which
won't grab pages again.
Anyway, I'm also fine with your way since it is simple.

Reviewed-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>