Re: [PATCH 1/2] xfs: dynamic speculative EOF preallocation

Alex Elder <aelder@xxxxxxx> · Thu, 14 Oct 2010 12:22:45 -0500

On Mon, 2010-10-04 at 21:13 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> Currently the size of the speculative preallocation during delayed
> allocation is fixed by either the allocsize mount option of a
> default size. We are seeing a lot of cases where we need to
> recommend using the allocsize mount option to prevent fragmentation
> when buffered writes land in the same AG.
> 
> Rather than using a fixed preallocation size by default (up to 64k),
> make it dynamic by exponentially increasing it on each subsequent
> preallocation. This will result in the preallocation size increasing
> as the file increases, so for streaming writes we are much more
> likely to get large preallocations exactly when we need it to reduce
> fragementation. It should also prevent the need for using the
> allocsize mount option for most workloads involving concurrent
> streaming writes.

I have some comments, below.

> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---

. . .

> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 2057614..b2e4782 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c

. . .

> @@ -427,11 +431,25 @@ xfs_iomap_eof_want_preallocate(
>  			if ((imap[n].br_startblock != HOLESTARTBLOCK) &&
>  			    (imap[n].br_startblock != DELAYSTARTBLOCK))
>  				return 0;
> +
>  			start_fsb += imap[n].br_blockcount;
>  			count_fsb -= imap[n].br_blockcount;
> +
> +			/* count delalloc blocks beyond EOF */
> +			if (imap[n].br_startblock == DELAYSTARTBLOCK)
> +				found_delalloc += imap[n].br_blockcount;
>  		}
>  	}
> -	*prealloc = 1;

At this point, count_fsb will be 0 (a necessary condition
for loop termination, since count_fsb is unsigned).  Since
found_delalloc is initially 0 (and is also unsigned), we
can safely assume that found_delalloc >= count_fsb.  The
only case in which found_delalloc <= count_fsb is if
found_delalloc is also 0 (a case you cover separately,
first, below).

Furthermore, *prealloc was assigned the value 0 at the top.
So I think this bottom section can be simplified to:
	if (!found_delalloc)
		*prealloc = 1;

But if that's the case, then maybe the loop can simply
return as soon as it finds a delayed allocation entry.

In other words, the net effect of this is that you
only tell the caller we want preallocation if *no*
preallocated blocks beyond EOF exist.  That may be
fine, but it doesn't seem to match your understanding
based on your code, so I just wanted to call your
attention to it.

> +	if (!found_delalloc) {
> +		/* haven't got any prealloc, so need some */
> +		*prealloc = 1;
> +	} else if (found_delalloc <= count_fsb) {
> +		/* almost run out of prealloc */
> +		*prealloc = 1;
> +	} else {
> +		/* still lots of prealloc left */
> +		*prealloc = 0;
> +	}
>  	return 0;
>  }
>  
> @@ -469,6 +487,7 @@ xfs_iomap_write_delay(
>  	extsz = xfs_get_extsz_hint(ip);
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  
> +

This is hunk should be killed.  It just adds an unwanted
blank line.

>  	error = xfs_iomap_eof_want_preallocate(mp, ip, offset, count,
>  				ioflag, imap, XFS_WRITE_IMAPS, &prealloc);
>  	if (error)
> @@ -476,9 +495,25 @@ xfs_iomap_write_delay(
>  
>  retry:
>  	if (prealloc) {
> +		xfs_fileoff_t	alloc_blocks = 0;
> +		/*
> +		 * If we don't have a user specified preallocation size, dynamically
> +		 * increase the preallocation size as we do more preallocation.
> +		 * Cap the maximum size at a single extent.
> +		 */
> +		if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)) {

I note that this is circumventing the special code in
xfs_set_rw_sizes() that tries to set up a different (smaller)
size in the event the "sync" (generic) mount option was supplied
(indicated by XFS_MOUNT_SYNC).  If that is a good thing, then I
suggest the code in xfs_set_rw_sizes() go away.  But it would be
good to have the case for making that change stated.

> +			alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN,
> +						(ip->i_last_prealloc * 4));
> +		}

So this is the spot that begs the question of whether the
the default I/O size mount option is needed any more.  The
net effect of your change (assuming no "allocsize" mount
option is in effect) is:
- Initially, ip->i_last_prealloc will be 0.  Therefore the
  first time through, the preallocated blocks beyond the
  end will be based on m_writeio_blocks (either 16KB or
  64KB, dependent on whether XFS_MOUNT_WSYNC was specified).
- Thereafter, whenever more preallocated-at-EOF blocks
  are needed, the number allocated will be 4 times more
  than the last time (growing exponentially), limited by
  the maximum extent size.

I guess the reason one might want the "allocsize" mount
option now becomes the opposite of why one might have
wanted it before.  I.e., it would be used to *reduce*
the size of the preallocated range beyond EOF, which I
could envision might be reasonable in some circumstances.

> +		if (alloc_blocks == 0)
> +			alloc_blocks = mp->m_writeio_blocks;
> +		ip->i_last_prealloc = alloc_blocks;
> +
>  		aligned_offset = XFS_WRITEIO_ALIGN(mp, (offset + count - 1));
>  		ioalign = XFS_B_TO_FSBT(mp, aligned_offset);
> -		last_fsb = ioalign + mp->m_writeio_blocks;
> +		last_fsb = ioalign + alloc_blocks;
> +		printk("ino %lld, ioalign 0x%llx, alloc_blocks 0x%llx\n",
> +				ip->i_ino, ioalign, alloc_blocks);

Kill the debug printk() call.

>  	} else {
>  		last_fsb = XFS_B_TO_FSB(mp, ((xfs_ufsize_t)(offset + count)));
>  	}

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs