On Mon, 24 Jan 2022 at 03:52, NeilBrown <neilb@xxxxxxx> wrote: > > swap_readpage() is given one page at a time, but maybe called repeatedly > in succession. > For block-device swapspace, the blk_plug functionality allows the > multiple pages to be combined together at lower layers. > That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is > only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS > are single page reads. > > With this patch we pass in a pointer-to-pointer when swap_readpage can > store state between calls - much like the effect of blk_plug. After > calling swap_readpage() some number of times, the state will be passed > to swap_read_unplug() which can submit the combined request. > > Some caller currently call blk_finish_plug() *before* the final call to > swap_readpage(), so the last page cannot be included. This patch moves > blk_finish_plug() to after the last call, and calls swap_read_unplug() > there too. > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > --- > mm/madvise.c | 8 +++- > mm/memory.c | 2 + > mm/page_io.c | 102 +++++++++++++++++++++++++++++++++++-------------------- > mm/swap.h | 16 +++++++-- > mm/swap_state.c | 19 +++++++--- > 5 files changed, 98 insertions(+), 49 deletions(-) > ... > diff --git a/mm/page_io.c b/mm/page_io.c > index 6e32ca35d9b6..bcf655d650c8 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -390,46 +391,60 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, > static void sio_read_complete(struct kiocb *iocb, long ret) > { > struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); > - struct page *page = sio->bvec.bv_page; > - > - if (ret != 0 && ret != PAGE_SIZE) { > - SetPageError(page); > - ClearPageUptodate(page); > - pr_alert_ratelimited("Read-error on swap-device\n"); > - } else { > - SetPageUptodate(page); > - count_vm_event(PSWPIN); > + int p; > + > + for (p = 0; p < sio->pages; p++) { > + struct page *page = sio->bvec[p].bv_page; > + if (ret != 0 && ret != PAGE_SIZE * sio->pages) { > + SetPageError(page); > + ClearPageUptodate(page); > + pr_alert_ratelimited("Read-error on swap-device\n"); > + } else { > + SetPageUptodate(page); > + count_vm_event(PSWPIN); > + } > + unlock_page(page); > } > - unlock_page(page); > mempool_free(sio, sio_pool); > } Trivial: on success, could be single call to count_vm_events(PSWPIN, sio->pages). Similar comment for PSWPOUT in sio_write_complete() > > -static int swap_readpage_fs(struct page *page) > +static void swap_readpage_fs(struct page *page, > + struct swap_iocb **plug) > { > struct swap_info_struct *sis = page_swap_info(page); > - struct file *swap_file = sis->swap_file; > - struct address_space *mapping = swap_file->f_mapping; > - struct iov_iter from; > - struct swap_iocb *sio; > + struct swap_iocb *sio = NULL; > loff_t pos = page_file_offset(page); > - int ret; > - > - sio = mempool_alloc(sio_pool, GFP_KERNEL); > - init_sync_kiocb(&sio->iocb, swap_file); > - sio->iocb.ki_pos = pos; > - sio->iocb.ki_complete = sio_read_complete; > - sio->bvec.bv_page = page; > - sio->bvec.bv_len = PAGE_SIZE; > - sio->bvec.bv_offset = 0; > > - iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); > - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); > - if (ret != -EIOCBQUEUED) > - sio_read_complete(&sio->iocb, ret); > - return ret; > + if (*plug) > + sio = *plug; 'plug' can be NULL when called from do_swap_page(); if (plug && *plug) Cheers, Mark