On Tue, 25 Jan 2022, Mark Hemment wrote: > On Mon, 24 Jan 2022 at 03:52, NeilBrown <neilb@xxxxxxx> wrote: > > > > swap_readpage() is given one page at a time, but maybe called repeatedly > > in succession. > > For block-device swapspace, the blk_plug functionality allows the > > multiple pages to be combined together at lower layers. > > That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is > > only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS > > are single page reads. > > > > With this patch we pass in a pointer-to-pointer when swap_readpage can > > store state between calls - much like the effect of blk_plug. After > > calling swap_readpage() some number of times, the state will be passed > > to swap_read_unplug() which can submit the combined request. > > > > Some caller currently call blk_finish_plug() *before* the final call to > > swap_readpage(), so the last page cannot be included. This patch moves > > blk_finish_plug() to after the last call, and calls swap_read_unplug() > > there too. > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > > mm/madvise.c | 8 +++- > > mm/memory.c | 2 + > > mm/page_io.c | 102 +++++++++++++++++++++++++++++++++++-------------------- > > mm/swap.h | 16 +++++++-- > > mm/swap_state.c | 19 +++++++--- > > 5 files changed, 98 insertions(+), 49 deletions(-) > > > ... > > diff --git a/mm/page_io.c b/mm/page_io.c > > index 6e32ca35d9b6..bcf655d650c8 100644 > > --- a/mm/page_io.c > > +++ b/mm/page_io.c > > @@ -390,46 +391,60 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, > > static void sio_read_complete(struct kiocb *iocb, long ret) > > { > > struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); > > - struct page *page = sio->bvec.bv_page; > > - > > - if (ret != 0 && ret != PAGE_SIZE) { > > - SetPageError(page); > > - ClearPageUptodate(page); > > - pr_alert_ratelimited("Read-error on swap-device\n"); > > - } else { > > - SetPageUptodate(page); > > - count_vm_event(PSWPIN); > > + int p; > > + > > + for (p = 0; p < sio->pages; p++) { > > + struct page *page = sio->bvec[p].bv_page; > > + if (ret != 0 && ret != PAGE_SIZE * sio->pages) { > > + SetPageError(page); > > + ClearPageUptodate(page); > > + pr_alert_ratelimited("Read-error on swap-device\n"); > > + } else { > > + SetPageUptodate(page); > > + count_vm_event(PSWPIN); > > + } > > + unlock_page(page); > > } > > - unlock_page(page); > > mempool_free(sio, sio_pool); > > } > > Trivial: on success, could be single call to count_vm_events(PSWPIN, > sio->pages). > Similar comment for PSWPOUT in sio_write_complete() > > > > > -static int swap_readpage_fs(struct page *page) > > +static void swap_readpage_fs(struct page *page, > > + struct swap_iocb **plug) > > { > > struct swap_info_struct *sis = page_swap_info(page); > > - struct file *swap_file = sis->swap_file; > > - struct address_space *mapping = swap_file->f_mapping; > > - struct iov_iter from; > > - struct swap_iocb *sio; > > + struct swap_iocb *sio = NULL; > > loff_t pos = page_file_offset(page); > > - int ret; > > - > > - sio = mempool_alloc(sio_pool, GFP_KERNEL); > > - init_sync_kiocb(&sio->iocb, swap_file); > > - sio->iocb.ki_pos = pos; > > - sio->iocb.ki_complete = sio_read_complete; > > - sio->bvec.bv_page = page; > > - sio->bvec.bv_len = PAGE_SIZE; > > - sio->bvec.bv_offset = 0; > > > > - iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); > > - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); > > - if (ret != -EIOCBQUEUED) > > - sio_read_complete(&sio->iocb, ret); > > - return ret; > > + if (*plug) > > + sio = *plug; > > 'plug' can be NULL when called from do_swap_page(); > if (plug && *plug) Thanks for catching that! I actually want it to be if (plug) sio = *plug; which nicely balances the if (plug) *plug = sio; at the end of the function. Thanks, NeilBrown