Hi Ying, On Thu, Aug 20, 2020 at 12:36:08PM +0800, Huang, Ying wrote: > Gao Xiang <hsiangkao@xxxxxxxxxx> writes: > > > SWP_FS doesn't mean the device is file-backed swap device, > > which just means each writeback request should go through fs > > by DIO. Or it'll just use extents added by .swap_activate(), > > but it also works as file-backed swap device. > > > > So in order to achieve the goal of the original patch, > > SWP_BLKDEV should be used instead. > > > > FS corruption can be observed with SSD device + XFS + > > fragmented swapfile due to CONFIG_THP_SWAP=y. > > > > Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device") > > Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out") > > Cc: "Huang, Ying" <ying.huang@xxxxxxxxx> > > Cc: stable <stable@xxxxxxxxxxxxxxx> > > Signed-off-by: Gao Xiang <hsiangkao@xxxxxxxxxx> > > Good catch! The fix itself looks good me! Although the description is > a little confusing. > > After some digging, it seems that SWP_FS is set on the swap devices > which make swap entry read/write go through the file system specific > callback (now used by swap over NFS only). Okay, let me send out v2 with the updated commit message in https://lore.kernel.org/r/20200820012409.GB5846@xxxxxxxxxxxxxxxxxx/ Thanks, Gao Xiang > > Best Regards, > Huang, Ying > > > --- > > > > I reproduced the issue with the following details: > > > > Environment: > > QEMU + upstream kernel + buildroot + NVMe (2 GB) > > > > Kernel config: > > CONFIG_BLK_DEV_NVME=y > > CONFIG_THP_SWAP=y > > > > Some reproducable steps: > > mkfs.xfs -f /dev/nvme0n1 > > mkdir /tmp/mnt > > mount /dev/nvme0n1 /tmp/mnt > > bs="32k" > > sz="1024m" # doesn't matter too much, I also tried 16m > > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > > xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw > > xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw > > > > mkswap /tmp/mnt/sw > > swapon /tmp/mnt/sw > > > > stress --vm 2 --vm-bytes 600M # doesn't matter too much as well > > > > Symptoms: > > - FS corruption (e.g. checksum failure) > > - memory corruption at: 0xd2808010 > > - segfault > > ... > > > > mm/swapfile.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 6c26916e95fd..2937daf3ca02 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1074,7 +1074,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) > > goto nextsi; > > } > > if (size == SWAPFILE_CLUSTER) { > > - if (!(si->flags & SWP_FS)) > > + if (si->flags & SWP_BLKDEV) > > n_ret = swap_alloc_cluster(si, swp_entries); > > } else > > n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, >