On Thu, 2022-05-05 at 09:41 +1000, NeilBrown wrote: > On Tue, 03 May 2022, Yang Shi wrote: > > On Sun, May 1, 2022 at 9:23 PM NeilBrown <neilb@xxxxxxx> wrote: > > > > > > On Sat, 30 Apr 2022, Yang Shi wrote: > > > > On Thu, Apr 28, 2022 at 5:44 PM NeilBrown <neilb@xxxxxxx> wrote: > > > > > > > > > > Pages passed to swap_readpage()/swap_writepage() are not necessarily all > > > > > the same size - there may be transparent-huge-pages involves. > > > > > > > > > > The BIO paths of swap_*page() handle this correctly, but the SWP_FS_OPS > > > > > path does not. > > > > > > > > > > So we need to use thp_size() to find the size, not just assume > > > > > PAGE_SIZE, and we need to track the total length of the request, not > > > > > just assume it is "page * PAGE_SIZE". > > > > > > > > Swap-over-nfs doesn't support THP swap IIUC. So SWP_FS_OPS should not > > > > see THP at all. But I agree to remove the assumption about page size > > > > in this path. > > > > > > Can you help me understand this please. How would the swap code know > > > that swap-over-NFS doesn't support THP swap? There is no reason that > > > NFS wouldn't be able to handle 2MB writes. Even 1GB should work though > > > NFS would have to split into several smaller WRITE requests. > > > > AFAICT, THP swap is only supported on non-rotate block devices, for > > example, SSD, PMEM, etc. IIRC, the swap device has to support the > > cluster in order to swap THP. The cluster is only supported by > > non-rotate block devices. > > > > Looped Ying in, who is the author of THP swap. > > I hunted around the code and found that THP swap only happens if a > 'cluster_info' is allocated, and that only happens if > if (p->bdev && bdev_nonrot(p->bdev)) { > in the swapon syscall. > And in get_swap_pages(), the cluster is only allocated for block devices. if (size == SWAPFILE_CLUSTER) { if (si->flags & SWP_BLKDEV) n_ret = swap_alloc_cluster(si, swp_entries); } else n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries); We may remove this restriction in the future if someone can show the benefit. Best Regards, Huang, Ying > I guess "nonrot" is being use as a synonym for "low latency"... > So even if NFS was low-latency it couldn't benefit from THP swap. > > So as you say it is not currently possible for THP pages to be send to > NFS for swapout. It makes sense to prepare for it though I think - if > only so that the code is more consistent and less confusing. > > Thanks, > NeilBrown