Re: [PATCH] xfstests: generic: test for discard properly discarding unused extents

Jeff Mahoney <jeffm@xxxxxxxx> · Wed, 01 Apr 2015 15:01:07 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 4/1/15 2:44 PM, Brian Foster wrote:
> On Mon, Mar 30, 2015 at 03:11:06PM -0400, Jeff Mahoney wrote:
>> This tests tests four conditions where discard can potentially 
>> not discard unused extents completely.
>> 
>> We test, with -o discard and with fstrim, scenarios of removing 
>> many relatively small files and removing several large files.
>> 
>> The important part of the two scenarios is that the large files 
>> must be large enough to span a blockgroup alone. It's possible 
>> for an entire block group to be emptied and dropped without an 
>> opportunity to discard individual extents as would happen with 
>> smaller files.
>> 
>> The test confirms the discards have occured by using a sparse 
>> file mounted via loopback to punch holes and then check how many 
>> blocks are still allocated within the file.
>> 
>> Signed-off-by: Jeff Mahoney <jeffm@xxxxxxxx> ---
> 
> The code looks mostly Ok to me, a few notes below. Those aside, 
> this is a longish test. It takes me about 8 minutes to run on my 
> typical low end vm.

My test hardware is a 16 core / 16 GB RAM machine using a commodity
SSD. It ran pretty quickly.

I suppose I should start by explaining that I wrote the test to be
btrfs specific and then realized that the only thing that was
/actually/ btrfs-specific was the btrfs filesystem sync call. I ran it
on XFS to ensure it worked as expected, but didn't have any reason to
try to adapt it to work in any other environment.

> Is the 1GB block group magic value mutable in any way, or is it a 
> hardcoded thing (for btrfs I presume)? It would be nice if we could
> shrink that a bit. If not, perhaps there are some other ways to
> reduce the runtime...

It's not hardcoded for btrfs, but it is by far the most common sized
block group. I'd prefer to test what people are using.

> - Is there any reason a single discard or trim test instance must 
> be all large or small files? In other words, is there something 
> that this wouldn't catch if the 10GB were 50% filled with large 
> files and %50 with small files? That would allow us to trim the 
> maximum on the range of small file creation and only have two 
> invocations instead of four.

Only to draw attention to the obvious failure cases, which are
probably specific to btrfs. If a file spans an entire block group and
is removed, it skips the individual discards and depends on the block
group removal to discard the entire thing (this wasn't happening). If
there are lots of small files, it hits different paths, and I wanted
to make it clear which one each mode of the test was targeting.
Otherwise, whoever hits the failure is going to end up having to do it
manually, which defeats the purpose of having an automated test case, IM
O.

> - If the 1GB thing is in fact a btrfs thing, could we make the
> core test a bit more size agnostic (e.g., perhaps pass the file 
> count/size values as parameters) and then scale the parameters up 
> exclusively for btrfs? For example, set defaults of fssize=1G, 
> largefile=100MB, smallfile=[512b-5MB] or something of that nature 
> and override them to the 10GB, 1GB, 32k-... values for btrfs? That 
> way we don't need to write as much data for fs' where it might not 
> be necessary.

If someone wants to weigh in on what sane defaults for other file
systems might be, sure.

>> tests/generic/326     | 164 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++ 
>> tests/generic/326.out |   5 ++ tests/generic/group   |   1 + 3 
>> files changed, 170 insertions(+) create mode 100644 
>> tests/generic/326 create mode 100644 tests/generic/326.out
>> 
>> diff --git a/tests/generic/326 b/tests/generic/326 new file mode 
>> 100644 index 0000000..923a27f --- /dev/null +++ 
>> b/tests/generic/326 @@ -0,0 +1,164 @@ +#! /bin/bash +# FSQA Test 
>> No. 326 +# +# This test uses a loopback mount with PUNCH_HOLE 
>> support to test +# whether discard operations are working as 
>> expected. +# +# It tests both -odiscard and fstrim. +# +# 
>> Copyright (C) 2015 SUSE. All Rights Reserved. +# Author: Jeff 
>> Mahoney <jeffm@xxxxxxxx> +# +# This program is free software;
>> you can redistribute it and/or +# modify it under the terms of
>> the GNU General Public License as +# published by the Free
>> Software Foundation. +# +# This program is distributed in the
>> hope that it would be useful, +# but WITHOUT ANY WARRANTY;
>> without even the implied warranty of +# MERCHANTABILITY or
>> FITNESS FOR A PARTICULAR PURPOSE.  See the +# GNU General Public
>> License for more details. +# +# You should have received a copy
>> of the GNU General Public License +# along with this program; if
>> not, write the Free Software Foundation, +# Inc.,  51 Franklin
>> St, Fifth Floor, Boston, MA  02110-1301  USA 
>> +#-------------------------------------------------------------------
- ----
>>
>>
>> 
+#
>> + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output 
>> created by $seq" + +tmp=/tmp/$$ +status=1	# failure is the 
>> default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +loopdev= 
>> +tmpdir= +_cleanup() +{ +	[ -n "$tmpdir" ] && umount $tmpdir +	[ 
>> -n "$loopdev" ] && losetup -d $loopdev +} + +# get standard 
>> environment, filters and checks +. ./common/rc +. ./common/filter
>> + +# real QA test starts here +_need_to_be_root +_supported_fs
>> generic +_supported_os Linux +_require_scratch +_require_fstrim +
>> +rm -f $seqres.full + +_scratch_mkfs &>> $seqres.full
>> +_require_fs_space $SCRATCH_MNT $(( 10 * 1024 * 1024 ))
>> +_scratch_mount + +test_discard() +{ +	discard=$1 +	files=$2 + +
>> tmpfile=$SCRATCH_MNT/testfs.img.$$ + 
>> tmpdir=$SCRATCH_MNT/testdir.$$ +	mkdir -p $tmpdir || _fail "!!! 
>> failed to create temp mount dir" + +	# Create a sparse file to 
>> host the file system +	dd if=/dev/zero of=$tmpfile bs=1M count=1 
>> seek=10240 &> $seqres.full \ +		|| _fail "!!! failed to create
>> fs image file"
> 
> xfs_io -c "truncate ..." ?

Sure, I can do that.

>> + +	opts="" +	if [ "$discard" = "discard" ]; then +		opts="-o 
>> discard" +	fi +	losetup -f $tmpfile +	loopdev=$(losetup -j 
>> $tmpfile|awk -F: '{print $1}') +	_mkfs_dev $loopdev &> 
>> $seqres.full +	$MOUNT_PROG $opts $loopdev $tmpdir \ +		|| _fail 
>> "!!! failed to loopback mount" + +	if [ "$files" = "large" ]; 
>> then +		# Create files larger than 1GB so each one occupies +		# 
>> more than one block group +		for n in $(seq 1 8); do +			dd 
>> if=/dev/zero of=$tmpdir/file$n bs=1M count=1200 \ +				&> 
>> $seqres.full +		done +	else +		# Create up to 40k files sized 
>> 32k-1GB. +		mkdir -p $tmpdir/testdir +		for ((i = 1; i <= 40000; 
>> i++)); do +			SIZE=$(( $((1024*1024*1024)) / $(( $RANDOM + 1 )) 
>> )) +			$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 $SIZE" \ + 
>> $tmpdir/testdir/"${prefix}_$i" &> /dev/null +			if [ $? -ne 0 ]; 
>> then +				echo "Failed creating file ${prefix}_$i" \
> 
> ${prefix} doesn't evaluate to anything in my run of this test. $seq
> perhaps?

I lifted the loop from tests/generic/038 which took an argument. I
suppose I could make it ${seq}. It's not really necessary at all.

>> +					>>$seqres.full +				break +			fi +		done +	fi + +	sync + 
>> OSIZE="$(du -k $tmpfile|awk '{print $1}')" +	du -k $tmpfile >> 
>> $seqres.full + +	if [ "$files" = "large" ]; then +		rm 
>> $tmpdir/file? +	else +		rm -rf $tmpdir/testdir +	fi + +	# Ensure 
>> everything's actually on the hosted file system +	if [ "$FSTYP"
>> = "btrfs" ]; then +		_run_btrfs_util_prog filesystem sync $tmpdir
>> + fi +	sync +	sync
> 
> Any reason for the double syncs?

Superstition? IIRC, at one point in what is probably the ancient past,
btrfs needed two syncs to be safe. I kept running into false failures
without both a sync and the btrfs filesystem sync, so I just hit the
"no really, just do it" button.

>> +	if [ "$discard" = "trim" ]; then +		$FSTRIM_PROG $tmpdir +	fi +
>> +	$UMOUNT_PROG $tmpdir +	rmdir $tmpdir +	tmpdir= + +	# Sync the 
>> backing file system to ensure the hole punches have +	# happened 
>> and we can trust the result. +	if [ "$FSTYP" = "btrfs" ]; then + 
>> _run_btrfs_util_prog filesystem sync $SCRATCH_MNT +	fi +	sync + 
>> sync + +	NSIZE="$(du -k $tmpfile|awk '{print $1}')" +	du -k 
>> $tmpfile >> $seqres.full + +	# Going from ~ 10GB to 50MB is a 
>> good enough test to account for +	# metadata remaining on 
>> different file systems. +	if [ "$NSIZE" -gt $(( 50 * 1024 )) ]; 
>> then +		_fail "TRIM failed: before rm ${OSIZE}kB, after rm 
>> ${NSIZE}kB" +	fi +	rm $tmpfile +	losetup -d $loopdev +	loopdev= 
>> +} + + +echo "Testing with -odiscard, many small files" 
>> +test_discard discard many + +echo "Testing with -odiscard, 
>> several large files" +test_discard discard large + +echo
>> "Testing with fstrim, many small files" +test_discard trim many +
>> +echo "Testing with fstrim, several large files" +test_discard
>> trim large + +status=0 +exit diff --git a/tests/generic/326.out 
>> b/tests/generic/326.out new file mode 100644 index 
>> 0000000..37b0f2c --- /dev/null +++ b/tests/generic/326.out @@ 
>> -0,0 +1,5 @@ +QA output created by 084
> 
> 326

Thanks. I was just running it directly and missed this once I renamed
it from tests/btrfs.

Thanks for the review,

- -Jeff

> Brian
> 
>> +Testing with -odiscard, many small files +Testing with 
>> -odiscard, several large files +Testing with fstrim, many small 
>> files +Testing with fstrim, several large files diff --git 
>> a/tests/generic/group b/tests/generic/group index 
>> d56d3ce..75e17fa 100644 --- a/tests/generic/group +++ 
>> b/tests/generic/group @@ -183,3 +183,4 @@ 323 auto aio stress
>> 324 auto fsr quick 325 auto quick data log +326 auto metadata --
>>  1.8.5.6
>> 
>> 
>> -- Jeff Mahoney SUSE Labs -- To unsubscribe from this list: send 
>> the line "unsubscribe fstests" in the body of a message to 
>> majordomo@xxxxxxxxxxxxxxx More majordomo info at 
>> http://vger.kernel.org/majordomo-info.html
> 

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)

iQIcBAEBAgAGBQJVHEBzAAoJEB57S2MheeWyAzUQAKiAUkZfiXToiOi2+e5dIs30
KaZHMZrsiqNLSiOIvA5Ro6Zq9E+9mWwqLPzb5c+J3i7nVzUGFZ18kU6gun/eIvDa
fshNJ7CAb1w3oSszXsKP35N6Gr0AWkmw2ZYbzHmuqOhWpH32syFeHIlu7IZKpYaC
SDirIztzg00vpBlyLEj2HxoN2NOkufgINKXFqP38uqpEJN3z+ZmLTxzQshEMh9nX
P3rjxBkO+rrk2CY6gtnt28Gwf4EGGg3JVtDTNmCAIcASf9N0g3NcF2Gffnd5wMfH
bv+Dw6qHGm+dw9JENUUzXlDi8bWdddaxJINbcK0ovwEwPsfInqgWc0nvxXK1AKzE
RcI9U/LKsWrtib7gh6aojwJrm96++RXQz8znKJtBTXA/faoIHEiNkMlVZG551j9q
pRyXijnsUehCVNTO+nt2XsGVvjlUgw/bKeILlvbXfKYG+9BQrXJSOLPJEquRXMvn
SjsciAdayAdgs46RSQLjZhlaFwVocJHEwcuee81dXxbu7ku6wPaIVwSUVMMjLBi8
DkCSYFIRcl8O8NRkHz7ERTnPHFBEBTTIoP485CoNfmlauVY/dk/hOddsFYttXxZO
ZvupexKtwTj2Y1qtq0SumHS+MgZIhiKzBSeYrQf0jARBP9hB5ewW/3Vbx1E6yPyA
QGSUrEia3D/pewdKhnue
=AFOs
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html