On Wed, Apr 25, 2018 at 11:22:21AM +0800, Eryu Guan wrote: > On Fri, Apr 06, 2018 at 10:18:15AM -0400, Brian Foster wrote: > > The XFS filestreams allocator caches dir inode -> agno mappings in > > an MRU mechanism that holds elements in memory for an amount of time > > and then cleans up expired elements in the background. The elements > > typically held inode pointers without holding a reference to the > > associated inode. This means that if the inode is reclaimed before > > an expired entry is cleaned up, the MRU reaper can access freed > > memory and cause a panic. > > > > Test for this problem by performing continuous filestreams > > allocations under short-lived parent directory inodes. This will > > produce KASAN use-after-free splats if enabled during the test. > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > --- > > > > This test reproduces the problem described in this[1] thread. It's > > XFS-specific because of the filestream option and specific geometry used > > to format the scratch device. > > > > Brian > > > > [1] https://marc.info/?l=linux-xfs&m=152293031029161&w=2 > > From above thread, it seems that we don't need the kernel change > anymore, and the test itself would trigger dmesg check failure when > testing on kernel with kasan enabled, right? > Yep... > But I've looped the test for 200 times and it all passed without > triggering any KASAN warnings, kernel is v4.17-rc2. > The kernel fix ended up being a patch from Christoph. It looks like it made it into v4.17-rc1 as commit 7fcd3efa1e ("xfs: remove filestream item xfs_inode reference"). Could you perhaps try an older kernel or one with that patch reverted? > > > > tests/xfs/445 | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > tests/xfs/445.out | 2 + > > tests/xfs/group | 1 + > > 3 files changed, 114 insertions(+) > > create mode 100755 tests/xfs/445 > > create mode 100644 tests/xfs/445.out > > > > diff --git a/tests/xfs/445 b/tests/xfs/445 > > new file mode 100755 > > index 00000000..331a7dd4 > > --- /dev/null > > +++ b/tests/xfs/445 > > @@ -0,0 +1,111 @@ > > +#! /bin/bash > > +# FS QA Test 445 > > +# > > +# Test the XFS filestreams allocator for use-after-free inode access. The > > +# filestreams allocator uses the MRU and historically kept around unreferenced > > +# inode pointers in each element. These pointers could outlive the inodes they > > +# referred to and thus lead to access of freed or reused memory when the MRU > > +# element was reaped. Test for this problem by performing filestream allocations > > +# against short-lived parent directory inodes. > > +# > > +# Note that some form of kernel debug mechanism for use-after-free detection > > +# (i.e., KASAN) is required for this test to reproduce the original problem. > > +# This is because XFS uses a kmem cache for xfs_inode objects which means that > > +# the backing pages for freed inodes may still reside in the cache with the > > +# freed inodes in a partially initialized state. > > +# > > +#----------------------------------------------------------------------- > > +# Copyright (c) 2018 Red Hat, Inc. All Rights Reserved. > > +# > > +# This program is free software; you can redistribute it and/or > > +# modify it under the terms of the GNU General Public License as > > +# published by the Free Software Foundation. > > +# > > +# This program is distributed in the hope that it would be useful, > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > +# GNU General Public License for more details. > > +# > > +# You should have received a copy of the GNU General Public License > > +# along with this program; if not, write the Free Software Foundation, > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > +#----------------------------------------------------------------------- > > +# > > + > > +seq=`basename $0` > > +seqres=$RESULT_DIR/$seq > > +echo "QA output created by $seq" > > + > > +here=`pwd` > > +tmp=/tmp/$$ > > +status=1 # failure is the default! > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > + > > +_cleanup() > > +{ > > + cd / > > + rm -f $tmp.* > > +} > > + > > +# get standard environment, filters and checks > > +. ./common/rc > > +. ./common/filter > > +. ./common/filestreams > > + > > +# remove previous $seqres.full before test > > +rm -f $seqres.full > > + > > +# real QA test starts here > > +drop_caches() > > +{ > > + while [ true ]; do > > + echo 2 > /proc/sys/vm/drop_caches > > + sleep 1 > > + done > > +} > > + > > +# Modify as appropriate. > > +_supported_fs generic > > +_supported_os Linux > > +_require_scratch > > + > > +# check for filestreams and min device size > > +_check_filestreams_support || _notrun "filestreams not available" > > +devsize=`blockdev --getsize64 $SCRATCH_DEV` > > There're a few other tests follow the same pattern in getting device > size, seems it's time to make it a helper function :) > TBH I wasn't quite sure this was necessary with a 2GB requirement, but I figured it couldn't hurt just in case. A helper seems fine.. are you thinking something like _require_scratch_size() perhaps? > > +[ $devsize -gt $((2 * 1024 * 1024 * 1024)) ] || _notrun "scratch dev too small" > > + > > +# use small AGs for frequent stream switching > > +_scratch_mkfs_xfs -d agsize=20m,size=2g >> $seqres.full 2>&1 || > > + _fail "mkfs failed" > > +_scratch_mount "-o filestreams" || _fail "mount failed" > > No need to _fail on _scratch_mount failure, it's the default behavior > now, since commit 69eb6281a9d3 ("fstests: _fail test by default when > _scratch_mount fails"). > Ah, right. I'll drop that, thanks! Brian > Thanks, > Eryu > > > + > > +# start background inode reclaim > > +drop_caches & > > +pid=$! > > + > > +# Stress the filestreams allocator via continuous allocation to a file under > > +# different parent dirs. Remove the old dirs as the file is moved so the MRU > > +# references point to an unlinked inode by the time they are removed. If the > > +# old dir inodes are reclaimed and associated memory reused, MRU cleanup can > > +# access the inode after it's been freed. > > +dir=$SCRATCH_MNT > > +for i in $(seq 0 90); do > > + mkdir -p $dir/$i > > + $XFS_IO_PROG -fc "falloc $(($i * 20))m 20m" $dir/$i/file > > + > > + mkdir -p $dir/$((i + 1)) > > + mv $dir/$i/file $dir/$((i + 1))/file > > + rmdir $dir/$i > > + > > + # throttle to ensure this loop sees several cache reclaims > > + sleep 0.1 > > +done > > + > > +kill $pid 2> /dev/null > > +wait $pid 2> /dev/null > > + > > +echo Silence is golden > > + > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/xfs/445.out b/tests/xfs/445.out > > new file mode 100644 > > index 00000000..44e55d20 > > --- /dev/null > > +++ b/tests/xfs/445.out > > @@ -0,0 +1,2 @@ > > +QA output created by 445 > > +Silence is golden > > diff --git a/tests/xfs/group b/tests/xfs/group > > index 831f2cfa..2a7dec6f 100644 > > --- a/tests/xfs/group > > +++ b/tests/xfs/group > > @@ -442,3 +442,4 @@ > > 442 auto stress clone quota > > 443 auto quick ioctl fsr > > 444 auto quick > > +445 auto filestreams > > -- > > 2.13.6 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe fstests" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html