On Thu, Aug 20, 2020 at 12:53:23PM +0800, Gao Xiang wrote: > SWP_FS is used to make swap_{read,write}page() go through > the filesystem, and it's only used for swap files over > NFS. So, !SWP_FS means non NFS for now, it could be either > file backed or device backed. Something similar goes with > legacy SWP_FILE. > > So in order to achieve the goal of the original patch, > SWP_BLKDEV should be used instead. > > FS corruption can be observed with SSD device + XFS + > fragmented swapfile due to CONFIG_THP_SWAP=y. > > I reproduced the issue with the following details: > > Environment: > QEMU + upstream kernel + buildroot + NVMe (2 GB) > > Kernel config: > CONFIG_BLK_DEV_NVME=y > CONFIG_THP_SWAP=y > > Some reproducable steps: > mkfs.xfs -f /dev/nvme0n1 > mkdir /tmp/mnt > mount /dev/nvme0n1 /tmp/mnt > bs="32k" > sz="1024m" # doesn't matter too much, I also tried 16m > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw > > mkswap /tmp/mnt/sw > swapon /tmp/mnt/sw > > stress --vm 2 --vm-bytes 600M # doesn't matter too much as well > > Symptoms: > - FS corruption (e.g. checksum failure) > - memory corruption at: 0xd2808010 > - segfault > > Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device") > Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out") > Cc: "Huang, Ying" <ying.huang@xxxxxxxxx> > Cc: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> > Cc: Rafael Aquini <aquini@xxxxxxxxxx> > Cc: Dave Chinner <david@xxxxxxxxxxxxx> > Cc: stable <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxx> > --- > v1: https://lore.kernel.org/r/20200819195613.24269-1-hsiangkao@xxxxxxxxxx > > changes since v1: > - improve commit message description > > Hi Andrew, > Kindly consider this one instead if no other concerns... > > Thanks, > Gao Xiang > > mm/swapfile.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 6c26916e95fd..2937daf3ca02 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1074,7 +1074,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) > goto nextsi; > } > if (size == SWAPFILE_CLUSTER) { > - if (!(si->flags & SWP_FS)) > + if (si->flags & SWP_BLKDEV) > n_ret = swap_alloc_cluster(si, swp_entries); > } else > n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, > -- > 2.18.1 > Acked-by: Rafael Aquini <aquini@xxxxxxxxxx>