On Sun, Jun 26, 2022 at 09:15:27PM -0700, Darrick J. Wong wrote: > On Wed, Jun 22, 2022 at 05:42:11PM -0700, Darrick J. Wong wrote: > > [resend with shorter 522.out file to keep us under the 300k maximum] > > > > On Thu, Dec 16, 2021 at 09:07:15PM +0000, Matthew Wilcox (Oracle) wrote: > > > Now that iomap has been converted, XFS is large folio safe. > > > Indicate to the VFS that it can now create large folios for XFS. > > > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_icache.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > > index da4af2142a2b..cdc39f576ca1 100644 > > > --- a/fs/xfs/xfs_icache.c > > > +++ b/fs/xfs/xfs_icache.c > > > @@ -87,6 +87,7 @@ xfs_inode_alloc( > > > /* VFS doesn't initialise i_mode or i_state! */ > > > VFS_I(ip)->i_mode = 0; > > > VFS_I(ip)->i_state = 0; > > > + mapping_set_large_folios(VFS_I(ip)->i_mapping); > > > > > > XFS_STATS_INC(mp, vn_active); > > > ASSERT(atomic_read(&ip->i_pincount) == 0); > > > @@ -320,6 +321,7 @@ xfs_reinit_inode( > > > inode->i_rdev = dev; > > > inode->i_uid = uid; > > > inode->i_gid = gid; > > > + mapping_set_large_folios(inode->i_mapping); > > > > Hmm. Ever since 5.19-rc1, I've noticed that fsx in generic/522 now > > reports file corruption after 20 minutes of runtime. The corruption is > > surprisingly reproducible (522.out.bad attached below) in that I ran it > > three times and always got the same bad offset (0x6e000) and always the > > same opcode (6213798(166 mod 256) MAPREAD). > > > > I turned off multipage folios and now 522 has run for over an hour > > without problems, so before I go do more debugging, does this ring a > > bell to anyone? > > I tried bisecting, but that didn't yield anything productive and > 5.19-rc4 still fails after 25 minutes; however, it seems that g/522 will > run without problems for at least 3-4 days after reverting this patch > from -rc3. Took 63 million ops and just over 3 hours before it failed here with a similar 16 byte map read corruption on the first 16 bytes of a page. Given the number of fallocate operations that lead up to the failure - 14 of last 23, plus 3 clone, 2 copy, 2 map read, 1 skip and the map write that it suggests the stale data came from - this smells of an invalidation issue... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx