OK... I used David's suggestion, and also put I it right in orangefs_readahead, orangefs_readahead_cleanup is gone. It seems to me to work great, I used it some with some printks in it and watched it do like I think it ought to. Here's an example of what's upstream in 5.11.8-200.fc33.x86_64: # dd if=/pvfsmnt/z1 of=/dev/null bs=4194304 count=30 30+0 records in 30+0 records out 125829120 bytes (126 MB, 120 MiB) copied, 5.77943 s, 21.8 MB/s And here's this version of orangefs_readahead on top of 5.12.0-rc4: # dd if=/pvfsmnt/z1 of=/dev/null bs=4194304 count=30 30+0 records in 30+0 records out 125829120 bytes (126 MB, 120 MiB) copied, 0.325919 s, 386 MB/s So now we're getting somewhere :-). I hope readahead_expand will be upstream soon. I plan to use inode->i_size and offset to decide how much expansion is needed on each call to orangefs_readahead, I hope looking at i_size isn't one of those race condition things I'm always screwing up on. If y'all think the orangefs_readahead below is an OK starting point, I'll add in the i_size/offset logic so I can get fullsized orangefs gulps of readahead all the way up to the last whatever sized fragment of the file and run xfstests on it to see if it still seems like it is doing right. One day when it is possible I wish I could figure out how to use huge pages or something, copying 1024 pages at a time out of the orangefs internal buffer into the page cache is probably slower than if I could figure out a way to copy 4194304 bytes out of our buffer into the page cache at once... Matthew>> but given that we're talking about doing I/O, probably Matthew>> not enough to care about. With orangefs that almost ALL we care about. Thanks for your help! -Mike static void orangefs_readahead(struct readahead_control *rac) { unsigned int npages; loff_t offset; struct iov_iter iter; struct file *file = rac->file; struct inode *inode = file->f_mapping->host; struct xarray *i_pages; pgoff_t index; struct page *page; int ret; loff_t new_start = readahead_index(rac) * PAGE_SIZE; size_t new_len = 524288; readahead_expand(rac, new_start, new_len); npages = readahead_count(rac); offset = readahead_pos(rac); i_pages = &file->f_mapping->i_pages; iov_iter_xarray(&iter, READ, i_pages, offset, npages * PAGE_SIZE); /* read in the pages. */ ret = wait_for_direct_io(ORANGEFS_IO_READ, inode, &offset, &iter, npages * PAGE_SIZE, inode->i_size, NULL, NULL, file); /* clean up. */ while ((page = readahead_page(rac))) { page_endio(page, false, 0); put_page(page); } } On Sat, Mar 27, 2021 at 9:57 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Sat, Mar 27, 2021 at 08:31:38AM +0000, David Howells wrote: > > However, in Mike's orangefs_readahead_cleanup(), he could replace: > > > > rcu_read_lock(); > > xas_for_each(&xas, page, last) { > > page_endio(page, false, 0); > > put_page(page); > > } > > rcu_read_unlock(); > > > > with: > > > > while ((page = readahead_page(ractl))) { > > page_endio(page, false, 0); > > put_page(page); > > } > > > > maybe? > > I'd rather see that than open-coded use of the XArray. It's mildly > slower, but given that we're talking about doing I/O, probably not enough > to care about.