Re: [RFC PATCH 0/3]: Extreme fragmentation ahoy!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 07, 2019 at 04:08:10PM +1100, Dave Chinner wrote:
> Hi folks,
> 
> I've just finished analysing an IO trace from a application
> generating an extreme filesystem fragmentation problem that started
> with extent size hints and ended with spurious ENOSPC reports due to
> massively fragmented files and free space. While the ENOSPC issue
> looks to have previously been solved, I still wanted to understand
> how the application had so comprehensively defeated extent size
> hints as a method of avoiding file fragmentation.
> 
> The key behaviour that I discovered was that specific "append write
> only" files that had extent size hints to prevent fragmentation
> weren't actually write only.  The application didn't do a lot of
> writes to the file, but it kept the file open and appended to the
> file (from the traces I have) in chunks of between ~3000 bytes and
> ~160000 bytes. This didn't explain the problem. I did notice that
> the files were opened O_SYNC, however.
> 
> I then found was another process that, once every second, opened the
> log file O_RDONLY, read 28 bytes from offset zero, then closed the
> file. Every second. IOWs, between every appending write that would
> allocate an extent size hint worth of space beyond EOF and then
> write a small chunk of it, there were numerous open/read/close
> cycles being done on the same file.
> 
> And what do we do on close()? We call xfs_release() and that can
> truncate away blocks beyond EOF. For some reason the close wasn't
> triggering the IDIRTY_RELEASE heuristic that preventd close from
> removing EOF blocks prematurely. Then I realised that O_SYNC writes
> don't leave delayed allocation blocks behind - they are always
> converted in the context of the write. That's why it wasn't
> triggering, and that meant that the open/read/close cycle was
> removing the extent size hint allocation beyond EOF prematurely.
> beyond EOF prematurely.

<urk>

> Then it occurred to me that extent size hints don't use delalloc
> either, so they behave the same was as O_SYNC writes in this
> situation.
> 
> Oh, and we remove EOF blocks on O_RDONLY file close, too. i.e. we
> modify the file without having write permissions.

Yikes!

> I suspect there's more cases like this when combined with repeated
> open/<do_something>/close operations on a file that is being
> written, but the patches address just these ones I just talked
> about. The test script to reproduce them is below. Fragmentation
> reduction results are in the commit descriptions. It's running
> through fstests for a couple of hours now, no issues have been
> noticed yet.
> 
> FWIW, I suspect we need to have a good hard think about whether we
> should be trimming EOF blocks on close by default, or whether we
> should only be doing it in very limited situations....
> 
> Comments, thoughts, flames welcome.
> 
> -Dave.
> 
> 
> #!/bin/bash
> #
> # Test 1

Can you please turn these into fstests to cause the maintainer maximal
immediate pain^W^W^Wmake everyone pay attention^W^W^W^Westablish a basis
for regression testing and finding whatever other problems we can find
from digging deeper? :)

--D

> #
> # Write multiple files in parallel using synchronous buffered writes. Aim is to
> # interleave allocations to fragment the files. Synchronous writes defeat the
> # open/write/close heuristics in xfs_release() that prevent EOF block removal,
> # so this should fragment badly.
> 
> workdir=/mnt/scratch
> nfiles=8
> wsize=4096
> wcnt=1000
> 
> echo
> echo "Test 1: sync write fragmentation counts"
> echo
> write_sync_file()
> {
> 	idx=$1
> 
> 	for ((cnt=0; cnt<$wcnt; cnt++)); do
> 		xfs_io -f -s -c "pwrite $((cnt * wsize)) $wsize" $workdir/file.$idx
> 	done
> }
> 
> rm -f $workdir/file*
> for ((n=0; n<$nfiles; n++)); do
> 	write_sync_file $n > /dev/null 2>&1 &
> done
> wait
> 
> sync
> 
> for ((n=0; n<$nfiles; n++)); do
> 	echo -n "$workdir/file.$n: "
> 	xfs_bmap -vp $workdir/file.$n | wc -l
> done;
> 
> 
> # Test 2
> #
> # Same as test 1, but instead of sync writes, use extent size hints to defeat
> # the open/write/close heuristic
> 
> extent_size=16m
> 
> echo
> echo "Test 2: Extent size hint fragmentation counts"
> echo
> 
> write_extsz_file()
> {
> 	idx=$1
> 
> 	xfs_io -f -c "extsize $extent_size" $workdir/file.$idx
> 	for ((cnt=0; cnt<$wcnt; cnt++)); do
> 		xfs_io -f -c "pwrite $((cnt * wsize)) $wsize" $workdir/file.$idx
> 	done
> }
> 
> rm -f $workdir/file*
> for ((n=0; n<$nfiles; n++)); do
> 	write_extsz_file $n > /dev/null 2>&1 &
> done
> wait
> 
> sync
> 
> for ((n=0; n<$nfiles; n++)); do
> 	echo -n "$workdir/file.$n: "
> 	xfs_bmap -vp $workdir/file.$n | wc -l
> done;
> 
> 
> 
> # Test 3
> #
> # Same as test 2, but instead of extent size hints, use open/read/close loops
> # on the files to remove EOF blocks.
> 
> echo
> echo "Test 3: Open/read/close loop fragmentation counts"
> echo
> 
> write_file()
> {
> 	idx=$1
> 
> 	xfs_io -f -s -c "pwrite -b 64k 0 50m" $workdir/file.$idx
> }
> 
> read_file()
> {
> 	idx=$1
> 
> 	for ((cnt=0; cnt<$wcnt; cnt++)); do
> 		xfs_io -f -r -c "pread 0 28" $workdir/file.$idx
> 	done
> }
> 
> rm -f $workdir/file*
> for ((n=0; n<$((nfiles * 4)); n++)); do
> 	write_file $n > /dev/null 2>&1 &
> 	read_file $n > /dev/null 2>&1 &
> done
> wait
> 
> sync
> 
> for ((n=0; n<$nfiles; n++)); do
> 	echo -n "$workdir/file.$n: "
> 	xfs_bmap -vp $workdir/file.$n | wc -l
> done;
> 
> 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux