On 11/1/21 7:31 PM, Jan Kara wrote: > On Thu 28-10-21 15:09:08, Joseph Qi wrote: >> Hi Jan, >> >> On 10/25/21 11:13 PM, Jan Kara wrote: >>> ocfs2_truncate_file() did unmap invalidate page cache pages before >>> zeroing partial tail cluster and setting i_size. Thus some pages could >>> be left (and likely have left if the cluster zeroing happened) in the >>> page cache beyond i_size after truncate finished letting user possibly >>> see stale data once the file was extended again. Also the tail cluster >> >> I don't quite understand the case. >> truncate_inode_pages() will truncate pages from new_i_size to i_size, >> and the following ocfs2_orphan_for_truncate() will zero range and then >> update i_size for inode as well as dinode. >> So once truncate finished, how stale data exposing happens? Or do you >> mean a race case between the above two steps? > > Sorry, I was not quite accurate in the above paragraph. There are several > ways how stale pages in the pagecache can cause problems. > > 1) Because i_size is reduced after truncating page cache, page fault can > happen after truncating page cache and zeroing pages but before reducing i_size. > This will in allow user to arbitrarily modify pages that are used for > writing zeroes into the cluster tail and after file extension these data > will become visible. > > 2) The tail cluster zeroing in ocfs2_orphan_for_truncate() can actually try > to write zeroed pages above i_size (e.g. if we have 4k blocksize, 64k > clustersize, and do truncate(f, 4k) on a 4k file). This will cause exactly > same problems as already described in commit 5314454ea3f "ocfs2: fix data > corruption after conversion from inline format". > > Hope it is clearer now. > So the core reason is ocfs2_zero_range_for_truncate() grabs pages and then zero, right? I think an alternative way is using zeroout instead of zero pages, which won't grab pages again. Anyway, I'm also fine with your way since it is simple. Reviewed-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx>