On Fri, Aug 08, 2014 at 10:35:38AM -0400, Chris Mason wrote: > > xfs is using truncate_pagecache_range to invalidate the page cache > during DIO reads. This is different from the other filesystems who only > invalidate pages during DIO writes. > > truncate_pagecache_range is meant to be used when we are freeing the > underlying data structs from disk, so it will zero any partial ranges > in the page. This means a DIO read can zero out part of the page cache > page, and it is possible the page will stay in cache. > > buffered reads will find an up to date page with zeros instead of the > data actually on disk. > > This patch fixes things by leaving the page cache alone during DIO > reads. > > We discovered this when our buffered IO program for distributing > database indexes was finding zero filled blocks. I think writes > are broken too, but I'll leave that for a separate patch because I don't > fully understand what XFS needs to happen during a DIO write. > > Test program: > ... > > Signed-off-by: Chris Mason <clm@xxxxxx> > cc: stable@xxxxxxxxxxxxxxx > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 1f66779..8d25d98 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -295,7 +295,11 @@ xfs_file_read_iter( > xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL); > return ret; > } > - truncate_pagecache_range(VFS_I(ip), pos, -1); > + > + /* we don't remove any pages here. A direct read > + * does not invalidate any contents of the page > + * cache > + */ > } That seems sane to me at first glance. I don't know why we would need to completely kill the cache on a dio read. I'm not a fan of the additional comment though. We should probably just fix up the existing comment instead. It also seems like we might be able to kill the XFS_IOLOCK_EXCL dance here if the truncate goes away.. Dave? FWIW, I had to go back to the following commit to see where this originates from: commit 9cea236492ebabb9545564eb039aa0f477a05c96 Author: Nathan Scott <nathans@xxxxxxx> Date: Fri Mar 17 17:26:41 2006 +1100 [XFS] Flush and invalidate dirty pages at the start of a direct read also, else we can hit a delalloc-extents-via-direct-io BUG. SGI-PV: 949916 SGI-Modid: xfs-linux-melb:xfs-kern:25483a Signed-off-by: Nathan Scott <nathans@xxxxxxx> ... That adds a VOP_FLUSHINVAL_PAGES() call that looks like it's some kind of portability API. I would expect the flush to deal with any delalloc conversion issues vs. the invalidation, so perhaps the invalidation part is a historical artifact of the api. Then again, there's also a straight 'flushpages' call so perhaps there's more to it than that. Brian > xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL); > } > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs