On Thu, Dec 26, 2019 at 09:47:21PM +0800, yu kuai wrote: > In xfs_file_fallocate, when punch hole, zero range or collapse range is > performed, xfs_fulsh_unmap_range() need to be called first. However, > xfs_map_blocks will convert the whole extent to real, even if there are > some blocks not related. Furthermore, the unrelated blocks will hold stale > data since xfs_fulsh_unmap_range didn't flush the correspond dirty pages > to disk. > > In this case, if user shutdown file system through xfsioctl with cmd > 'XFS_IOC_GOINGDOWN' and arg 'XFS_FSOP_GOING_FLAGS_LOGFLUSH'. All the > completed transactions will be flushed to disk, while dirty pages will > never be flushed to disk. And after remount, the file will hold stale > data. Waitaminute, what problem are you trying to solve? You have a file with a huge delalloc extent because we just wrote a bunch of 'X' characters to part of a file: ---dddddddddddddddd Then you want to fallocate or something in the middle of that: ---dddddddddddddddd ^^^^------ collapse range these blocks So we xfs_flush_unmap_range to kill the pagecache on that range: ---dddddddddddddddd ^^^^------ xfs_flush_unmap_range() This triggers writeback, which can convert the entire delalloc range to a single extent: ---rrrrrrrrrrrrrrrr ^^^^^^^^-- This is the range we are writing back ^^^^^^^^---------- This range doesn't undergo writeback, but we wrote the extent tree anyway After committing that update to the log, the fs goes down, which leaves us with the following after we reboot, mount, and recover the fs: ---rrrrrrrrrrrrrrrr ^^^^^^^^-- This part contains 'X' ^^^^^^^^---------- This range never underwent writeback, so it's full of junk from the previous owner of the space So your solution is to split the delalloc reservation to constrain the allocation to the range that's being operated on? If so, I think a better solution (at least from the perspective of reducing fragmentation) would be to map the extent unwritten and force a post-writeback conversion[1] but I got shot down for performance reasons the last time I suggested that. --D [1] https://lore.kernel.org/linux-xfs/155259894630.30230.10064390935593758177.stgit@magnolia/ > Fix the problem by spliting delalloc extent before xfs_flush_unmap_range > is called. > > Signed-off-by: yu kuai <yukuai3@xxxxxxxxxx> > --- > fs/xfs/xfs_file.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 47 insertions(+) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index c93250108952..5398102feec9 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -786,6 +786,50 @@ xfs_break_layouts( > > return error; > } > +int > +try_split_da_extent( > + struct xfs_inode *ip, > + loff_t offset, > + loff_t len) > +{ > + struct xfs_mount *mp = ip->i_mount; > + xfs_fileoff_t start = XFS_B_TO_FSBT(mp, offset); > + xfs_fileoff_t end = XFS_B_TO_FSBT(mp, offset + len - 1); > + struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); > + struct xfs_iext_cursor cur; > + struct xfs_bmbt_irec imap; > + int error; > + > + /* > + * if start belong to a delalloc extent and it's not the first block, > + * split the extent at start. > + */ > + if (xfs_iext_lookup_extent(ip, ifp, start, &cur, &imap) && > + imap.br_startblock != HOLESTARTBLOCK && > + isnullstartblock(imap.br_startblock) && > + start > imap.br_startoff) { > + error = xfs_bmap_split_da_extent(ip, start); > + if (error) > + return error; > + ip->i_d.di_nextents--; > + } > + > + /* > + * if end + 1 belong to a delalloc extent and it's not the first block, > + * split the extent at end + 1. > + */ > + if (xfs_iext_lookup_extent(ip, ifp, end + 1, &cur, &imap) && > + imap.br_startblock != HOLESTARTBLOCK && > + isnullstartblock(imap.br_startblock) && > + end + 1 > imap.br_startoff) { > + error = xfs_bmap_split_da_extent(ip, end + 1); > + if (error) > + return error; > + ip->i_d.di_nextents--; > + } > + > + return 0; > +} > > #define XFS_FALLOC_FL_SUPPORTED \ > (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \ > @@ -842,6 +886,9 @@ xfs_file_fallocate( > */ > if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE | > FALLOC_FL_COLLAPSE_RANGE)) { > + error = try_split_da_extent(ip, offset, len); > + if (error) > + goto out_unlock; > error = xfs_flush_unmap_range(ip, offset, len); > if (error) > goto out_unlock; > -- > 2.17.2 >