From: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx> I think everybody hates the readpages API. The fundamental problem with it is that it passes the pages to be read on a doubly linked list, using the ->lru list in the struct page. That means the filesystems have to do the work of calling add_to_page_cache{,_lru,_locked}, and handling failures (because another task is also accessing that chunk of the file, and so it fails). This is an attempt to add a ->readahead op to replace ->readpages. I've converted two users, iomap/xfs and cifs. The cifs conversion is lacking fscache support, and that's just because I didn't want to do that work; I don't believe there's anything fundamental to it. But I wanted to do iomap because it is The Infrastructure Of The Future and cifs because it is the sole remaining user of add_to_page_cache_locked(), which enables the last two patches in the series. By the way, that gives CIFS access to the workingset shadow infrastructure, which it had to ignore before because it couldn't put pages onto the lru list at the right time. The fundamental question is, how do we indicate to the implementation of ->readahead what pages to operate on? I've gone with passing a pagevec. This has the obvious advantage that it's a data structure that already exists and is used within filemap for batches of pages. I had to add a bit of new infrastructure to support iterating over the pages in the pagevec, but with that done, it's quite nice. I think the biggest problem is that the size of the pagevec is limited to 15 pages (60kB). So that'll mean that if the readahead window bumps all the way up to 256kB, we may end up making 5 BIOs (and merging them) instead of one. I'd kind of like to be able to allocate variable length pagevecs while allowing regular pagevecs to be allocated on the stack, but I can't figure out a way to do that. eg this doesn't work: - struct page *pages[PAGEVEC_SIZE]; + union { + struct page *pages[PAGEVEC_SIZE]; + struct page *_pages[]; + } and if we just allocate them, useful and wonderful tools are going to point out when pages[16] is accessed that we've overstepped the end of the array. I have considered alternatives to the pagevec like just having the ->readahead implementation look up the pages in the i_pages XArray directly. That didn't work out too well. Anyway, I want to draw your attention to the diffstat below. Net 91 lines deleted, and that's with adding all the infrastructure for ->readahead and getting rid of none of the infrastructure for ->readpages. There's probably a good couple of hundred lines of code to be deleted there. Matthew Wilcox (Oracle) (8): pagevec: Add an iterator mm: Fix the return type of __do_page_cache_readahead mm: Use a pagevec for readahead mm/fs: Add a_ops->readahead iomap,xfs: Convert from readpages to readahead cifs: Convert from readpages to readahead mm: Remove add_to_page_cache_locked mm: Unify all add_to_page_cache variants Documentation/filesystems/locking.rst | 8 +- Documentation/filesystems/vfs.rst | 9 ++ fs/cifs/file.c | 125 ++++------------------- fs/iomap/buffered-io.c | 60 +++-------- fs/iomap/trace.h | 18 ++-- fs/xfs/xfs_aops.c | 12 +-- include/linux/fs.h | 3 + include/linux/iomap.h | 4 +- include/linux/pagemap.h | 23 +---- include/linux/pagevec.h | 20 ++++ mm/filemap.c | 72 ++++--------- mm/internal.h | 2 +- mm/readahead.c | 141 +++++++++++++++++--------- 13 files changed, 203 insertions(+), 294 deletions(-) -- 2.24.1