On Wed, Feb 12, 2020 at 08:38:52PM -0800, Andrew Morton wrote: > On Fri, 24 Jan 2020 17:35:41 -0800 Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > From: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx> > > > > This series adds a readahead address_space operation to eventually > > replace the readpages operation. The key difference is that > > pages are added to the page cache as they are allocated (and > > then looked up by the filesystem) instead of passing them on a > > list to the readpages operation and having the filesystem add > > them to the page cache. It's a net reduction in code for each > > implementation, more efficient than walking a list, and solves > > the direct-write vs buffered-read problem reported by yu kuai at > > https://lore.kernel.org/linux-fsdevel/20200116063601.39201-1-yukuai3@xxxxxxxxxx/ > > Unclear which patch fixes this and how it did it? I suppose the problem isn't fixed until patch 13/13 is applied. What yu kuai is seeing is a race where readahead allocates a page, then passes it to iomap_readpages, which calls xfs_read_iomap_begin() which looks up the extent. Then thread 2 does DIO which modifies the extent, because there's nothing to say that thread 1 is still using it. With this patch series, the readpages code puts the locked pages in the cache before calling iomap_readpages, so any racing write will block on the locked page until readahead is completed. If you're tempted to put this into -mm, I have a couple of new changes; one to fix a kernel-doc warning for mpage_readahead() and one to add kernel-doc for iomap_readahead(): +++ b/fs/mpage.c @@ -339,9 +339,7 @@ /** * mpage_readahead - start reads against pages - * @mapping: the address_space - * @start: The number of the first page to read. - * @nr_pages: The number of consecutive pages to read. + * @rac: Describes which pages to read. * @get_block: The filesystem's block mapper function. * * This function walks the pages and the blocks within each page, building and +++ b/fs/iomap/buffered-io.c @@ -395,6 +395,21 @@ return done; } +/** + * iomap_readahead - Attempt to read pages from a file. + * @rac: Describes the pages to be read. + * @ops: The operations vector for the filesystem. + * + * This function is for filesystems to call to implement their readahead + * address_space operation. + * + * Context: The file is pinned by the caller, and the pages to be read are + * all locked and have an elevated refcount. This function will unlock + * the pages (once I/O has completed on them, or I/O has been determined to + * not be necessary). It will also decrease the refcount once the pages + * have been submitted for I/O. After this point, the page may be removed + * from the page cache, and should not be referenced. + */ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops) { struct inode *inode = rac->mapping->host; I'll do a v6 with those changes soon, but I would really like a bit more review from filesystem people, particularly ocfs2 and gfs2.