Re: Filesystem benchmarks on reasonably fast hardware

Jörn Engel <joern@xxxxxxxxx> · Mon, 18 Jul 2011 16:34:50 +0200

On Mon, 18 July 2011 20:57:49 +1000, Dave Chinner wrote:
> On Mon, Jul 18, 2011 at 09:53:39AM +0200, Jörn Engel wrote:
> > On Mon, 18 July 2011 09:32:52 +1000, Dave Chinner wrote:
> > > On Sun, Jul 17, 2011 at 06:05:01PM +0200, Jörn Engel wrote:
> 
> > > > xfs:
> > > > ====
> > > > seqrd	1	2	4	8	16	32	64	128
> > > > 16384	4698	4424	4397	4402	4394	4398	4642	4679	
> > > > 8192	6234	5827	5797	5801	5795	6114	5793	5812	
> > > > 4096	9100	8835	8882	8896	8874	8890	8910	8906	
> > > > 2048	14922	14391	14259	14248	14264	14264	14269	14273	
> > > > 1024	23853	22690	22329	22362	22338	22277	22240	22301	
> > > > 512	37353	33990	33292	33332	33306	33296	33224	33271	

seqrd	1	2	4	8	16	32	64	128
16384	4542	8311	15738	28955	38273	36644	38530	38527	
8192	6000	10413	19208	33878	65927	76906	77083	77102	
4096	8931	14971	24794	44223	83512	144867	147581	150702	
2048	14375	23489	34364	56887	103053	192662	307167	309222	
1024	21647	36022	49649	77163	132886	243296	421389	497581	
512	31832	61257	79545	108782	176341	303836	517814	584741	

Quite a nice improvement for such a small patch.  As they say, "every
small factor of 17 helps". ;)

What bothers me a bit is that the single-threaded numbers took such a
noticeable hit...

> Ok, the patch below takes the numbers on my test setup on a 16k IO
> size:
> 
> seqrd	1	2	4	8	16
> vanilla	3603	2798	 2563	not tested...
> patches 3707	5746	10304	12875	11016

...in particular when your numbers improve even for a single thread.
Wonder what's going on here.

Anyway, feel free to add a Tested-By: or something from me.  And maybe
fix the two typos below.

> xfs: don't serialise direct IO reads on page cache checks
> 
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> There is no need to grab the i_mutex of the IO lock in exclusive
> mode if we don't need to invalidate the page cache. Taking hese
                                                             ^
> locks on every direct IO effective serialisaes them as taking the IO
                                             ^
> lock in exclusive mode has to wait for all shared holders to drop
> the lock. That only happens when IO is complete, so effective it
> prevents dispatch of concurrent direct IO reads to the same inode.
> 
> Fix this by taking the IO lock shared to check the page cache state,
> and only then drop it and take the IO lock exclusively if there is
> work to be done. Hence for the normal direct IO case, no exclusive
> locking will occur.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/linux-2.6/xfs_file.c |   17 ++++++++++++++---
>  1 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
> index 1e641e6..16a4bf0 100644
> --- a/fs/xfs/linux-2.6/xfs_file.c
> +++ b/fs/xfs/linux-2.6/xfs_file.c
> @@ -321,7 +321,19 @@ xfs_file_aio_read(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	if (unlikely(ioflags & IO_ISDIRECT)) {
> +	/*
> +	 * Locking is a bit tricky here. If we take an exclusive lock
> +	 * for direct IO, we effectively serialise all new concurrent
> +	 * read IO to this file and block it behind IO that is currently in
> +	 * progress because IO in progress holds the IO lock shared. We only
> +	 * need to hold the lock exclusive to blow away the page cache, so
> +	 * only take lock exclusively if the page cache needs invalidation.
> +	 * This allows the normal direct IO case of no page cache pages to
> +	 * proceeed concurrently without serialisation.
> +	 */
> +	xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
> +	if ((ioflags & IO_ISDIRECT) && inode->i_mapping->nrpages) {
> +		xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
>  		xfs_rw_ilock(ip, XFS_IOLOCK_EXCL);
>  
>  		if (inode->i_mapping->nrpages) {
> @@ -334,8 +346,7 @@ xfs_file_aio_read(
>  			}
>  		}
>  		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
> -	} else
> -		xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
> +	}
>  
>  	trace_xfs_file_read(ip, size, iocb->ki_pos, ioflags);
>  

Jörn

-- 
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html