Re: [PATCH 1/2] ocfs2: Fix data corruption on truncate

Jan Kara <jack@xxxxxxx> · Tue, 2 Nov 2021 10:55:42 +0100

On Tue 02-11-21 10:36:42, Joseph Qi wrote:
> On 11/1/21 7:31 PM, Jan Kara wrote:
> > On Thu 28-10-21 15:09:08, Joseph Qi wrote:
> >> Hi Jan,
> >>
> >> On 10/25/21 11:13 PM, Jan Kara wrote:
> >>> ocfs2_truncate_file() did unmap invalidate page cache pages before
> >>> zeroing partial tail cluster and setting i_size. Thus some pages could
> >>> be left (and likely have left if the cluster zeroing happened) in the
> >>> page cache beyond i_size after truncate finished letting user possibly
> >>> see stale data once the file was extended again. Also the tail cluster
> >>
> >> I don't quite understand the case. 
> >> truncate_inode_pages() will truncate pages from new_i_size to i_size,
> >> and the following ocfs2_orphan_for_truncate() will zero range and then
> >> update i_size for inode as well as dinode.
> >> So once truncate finished, how stale data exposing happens? Or do you
> >> mean a race case between the above two steps?
> > 
> > Sorry, I was not quite accurate in the above paragraph. There are several
> > ways how stale pages in the pagecache can cause problems.
> > 
> > 1) Because i_size is reduced after truncating page cache, page fault can
> > happen after truncating page cache and zeroing pages but before reducing i_size.
> > This will in allow user to arbitrarily modify pages that are used for
> > writing zeroes into the cluster tail and after file extension these data
> > will become visible.
> > 
> > 2) The tail cluster zeroing in ocfs2_orphan_for_truncate() can actually try
> > to write zeroed pages above i_size (e.g. if we have 4k blocksize, 64k
> > clustersize, and do truncate(f, 4k) on a 4k file). This will cause exactly
> > same problems as already described in commit 5314454ea3f "ocfs2: fix data
> > corruption after conversion from inline format".
> > 
> > Hope it is clearer now.
> > 
> So the core reason is ocfs2_zero_range_for_truncate() grabs pages and
> then zero, right?

Well, that is the part that makes things easy to reproduce.

> I think an alternative way is using zeroout instead of zero pages, which
> won't grab pages again.

That would certainly reduce the likelyhood of problems but it is always
problematic to first truncate page cache and only then update i_size.
For OCFS2 racing page faults can recreate pages in the page cache before
i_size is reduced and thus cause "interesting" problems.

> Anyway, I'm also fine with your way since it is simple.
> 
> Reviewed-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>

Thanks!

									Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR