On 2024/6/3 6:46, Dave Chinner wrote: > On Wed, May 29, 2024 at 05:52:03PM +0800, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@xxxxxxxxxx> >> >> When truncating down an inode, we call xfs_truncate_page() to zero out >> the tail partial block that beyond new EOF, which prevents exposing >> stale data. But xfs_truncate_page() always assumes the blocksize is >> i_blocksize(inode), it's not always true if we have a large allocation >> unit for a file and we should aligned to this unitsize, e.g. realtime >> inode should aligned to the rtextsize. >> >> Current xfs_setattr_size() can't support zeroing out a large alignment >> size on trucate down since the process order is wrong. We first do zero >> out through xfs_truncate_page(), and then update inode size through >> truncate_setsize() immediately. If the zeroed range is larger than a >> folio, the write back path would not write back zeroed pagecache beyond >> the EOF folio, so it doesn't write zeroes to the entire tail extent and >> could expose stale data after an appending write into the next aligned >> extent. >> >> We need to adjust the order to zero out tail aligned blocks, write back >> zeroed or cached data, update i_size and drop cache beyond aligned EOF >> block, preparing for the fix of realtime inode and supporting the >> upcoming forced alignment feature. >> >> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> >> --- > ..... >> @@ -853,30 +854,7 @@ xfs_setattr_size( >> * the transaction because the inode cannot be unlocked once it is a >> * part of the transaction. >> * >> - * Start with zeroing any data beyond EOF that we may expose on file >> - * extension, or zeroing out the rest of the block on a downward >> - * truncate. >> - */ >> - if (newsize > oldsize) { >> - trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); >> - error = xfs_zero_range(ip, oldsize, newsize - oldsize, >> - &did_zeroing); >> - } else if (newsize != oldsize) { >> - error = xfs_truncate_page(ip, newsize, &did_zeroing); >> - } >> - >> - if (error) >> - return error; >> - >> - /* >> - * We've already locked out new page faults, so now we can safely remove >> - * pages from the page cache knowing they won't get refaulted until we >> - * drop the XFS_MMAP_EXCL lock after the extent manipulations are >> - * complete. The truncate_setsize() call also cleans partial EOF page >> - * PTEs on extending truncates and hence ensures sub-page block size >> - * filesystems are correctly handled, too. >> - * >> - * We have to do all the page cache truncate work outside the >> + * And we have to do all the page cache truncate work outside the >> * transaction context as the "lock" order is page lock->log space >> * reservation as defined by extent allocation in the writeback path. >> * Hence a truncate can fail with ENOMEM from xfs_trans_alloc(), but > ...... > > Lots of new logic for zeroing here. That makes xfs_setattr_size() > even longer than it already is. Can you lift this EOF zeroing logic > into it's own helper function so that it is clear that it is a > completely independent operation to the actual transaction that > changes the inode size. That would also allow the operations to be > broken up into: > > if (newsize >= oldsize) { > /* do the simple stuff */ > .... > return error; > } > /* do the complex size reduction stuff without additional indenting */ > Sure, I will try to factor them out. Thanks, Yi.