Re: [PATCH] xfs: speed up directory bestfree block scanning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 03, 2018 at 05:27:48PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> When running a "create millions inodes in a directory" test
> recently, I noticed we were spending a huge amount of time
> converting freespace block headers from disk format to in-memory
> format:
> 
>  31.47%  [kernel]  [k] xfs_dir2_node_addname
>  17.86%  [kernel]  [k] xfs_dir3_free_hdr_from_disk
>   3.55%  [kernel]  [k] xfs_dir3_free_bests_p
> 
> We shouldn't be hitting the best free block scanning code so hard
> when doing sequential directory creates, and it turns out there's
> a highly suboptimal loop searching the the best free array in
> the freespace block - it decodes the block header before checking
> each entry inside a loop, instead of decoding the header once before
> running the entry search loop.
> 
> This makes a massive difference to create rates. Profile now looks
> like this:
> 
>   13.15%  [kernel]  [k] xfs_dir2_node_addname
>    3.52%  [kernel]  [k] xfs_dir3_leaf_check_int
>    3.11%  [kernel]  [k] xfs_log_commit_cil
> 
> And the wall time/average file create rate differences are
> just as stark:
> 
> 		create time(sec) / rate (files/s)
> File count	     vanilla		    patched
>   10k		   0.54 / 18.5k		   0.53 / 18.9k
>   20k		   1.10	/ 18.1k		   1.05 / 19.0k
>  100k		   4.21	/ 23.8k		   3.91 / 25.6k
>  200k		   9.66	/ 20,7k		   7.37 / 27.1k
>    1M		  86.61	/ 11.5k		  48.26 / 20.7k
>    2M		 206.13	/  9.7k		 129.71 / 15.4k
> 
> The larger the directory, the bigger the performance improvement.

FWIW, while this improves the free block scanning code, it still
doesn't really fix the overall problem with the free space index
blocks. The problem there is that the free space index is
essentially a linear array, with one index per allocated data
block in the directory. This array is scanned linearly on every
insert to find the first block with enough free space to hold the
new entry. That, unfortunately, is problematic when new blocks are
always added at the tail of the list. hence we have to scan the
the entire free space index before we find the free space the last
insert added.

That's nasty - the result is that by the time we have 5 million
entries in the directory we have something like 60,000 4k blocks in
the filesystem and only the last one has free space. Hence we're
searching 60,000 entries just to find where to insert the next
entry. The result is this:

		    create time(sec)
File count	 vanilla        patched
 200k		   9.66		   7.37
   1M		  86.61		  48.26
   2M		 206.13		 129.71
  10M		2843.57		1817.39

it goes highly non-linear at above 1M entries in the directory.

In writing this, I think I can see a quick and simple change that
will fix this case and improve most other directory grow workloads
without affecting normal random directory insert/remove performance.
That is, do a reverse order search starting at the last block rather
than increasing order search starting at the first block.....

Ok, now were are talking - performance and scalability improvements!

		create time(sec) / rate (files/s)
 File count     vanilla		    loop-fix		+reverse
   10k	      0.54 / 18.5k	   0.53 / 18.9k	       0.52 / 19.3k
   20k	      1.10 / 18.1k	   1.05 / 19.0k	       1.00 / 20.0k
  100k	      4.21 / 23.8k	   3.91 / 25.6k	       3.58 / 27.9k
  200k	      9.66 / 20,7k	   7.37 / 27.1k	       7.08 / 28.3k
    1M	     86.61 / 11.5k	  48.26 / 20.7k	      38.33 / 26.1k
    2M	    206.13 /  9.7k	 129.71 / 15.4k	      82.20 / 24.3k
   10M	   2843.57 /  3.5k	1817.39 /  5.5k      591.78 / 16.9k

Theres still some non-linearity as we approach the 10M number, but
it's still 5x faster to 10M inodes than the existing code....

Reverse search patch - working but not finished, smoke tested only -
is attached below. Discuss!

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

xfs: reverse search directory freespace indexes

From: Dave Chinner <dchinner@xxxxxxxxxx>

When a directory is growing rapidly, new blocks tend to get added at
the end of teh directory. These end up at teh end of teh freespace
index, and when teh directory gets large finding these new
freespaces gets expensive. The code does a linear search across the
frespace index from the first block in teh directory to teh last,
hence meaning the newly added space is the last index searched.

Instead, do a reverse order index search, starting from the last
block and index in the freespace index. This makes most lookups for
free space on rapidly growing directories O(1) instead of O(N).
The result is a major improvement in large directory grow rates:

                create time(sec) / rate (files/s)
 File count     vanilla             patched
   10k        0.54 / 18.5k         0.52 / 19.3k
   20k        1.10 / 18.1k         1.00 / 20.0k
  100k        4.21 / 23.8k         3.58 / 27.9k
  200k        9.66 / 20,7k         7.08 / 28.3k
    1M       86.61 / 11.5k        38.33 / 26.1k
    2M      206.13 /  9.7k        82.20 / 24.3k
   10M     2843.57 /  3.5k       591.78 / 16.9k

Signed-Off-By: Dave Chinner <dchinner@xxxxxxxxxx>
---
 fs/xfs/libxfs/xfs_dir2_node.c | 91 ++++++++++++++++++-------------------------
 1 file changed, 38 insertions(+), 53 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index bcf0d43cd6a8..d95229a9d700 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -1701,10 +1701,11 @@ xfs_dir2_node_addname_int(
 	int			error;		/* error return value */
 	xfs_dir2_db_t		fbno;		/* freespace block number */
 	struct xfs_buf		*fbp;		/* freespace buffer */
-	int			findex;		/* freespace entry index */
-	xfs_dir2_free_t		*free=NULL;	/* freespace block structure */
+	int			findex = -1;	/* freespace entry index */
+	xfs_dir2_free_t		*free = NULL;	/* freespace block structure */
 	xfs_dir2_db_t		ifbno;		/* initial freespace block no */
-	xfs_dir2_db_t		lastfbno=0;	/* highest freespace block no */
+	xfs_dir2_db_t		lastfbno = 0;	/* highest freespace block no */
+	xfs_dir2_db_t		firstfbno = 0;	/* lowest freespace block no */
 	int			length;		/* length of the new entry */
 	int			logfree;	/* need to log free entry */
 	xfs_mount_t		*mp;		/* filesystem mount point */
@@ -1715,11 +1716,17 @@ xfs_dir2_node_addname_int(
 	__be16			*bests;
 	struct xfs_dir3_icfree_hdr freehdr;
 	struct xfs_dir2_data_free *bf;
+	xfs_fileoff_t		fo;		/* freespace block number */
 
 	dp = args->dp;
 	mp = dp->i_mount;
 	tp = args->trans;
 	length = dp->d_ops->data_entsize(args->namelen);
+
+	ifbno = -1;
+	dbno = -1;
+	fbp = NULL;
+
 	/*
 	 * If we came in with a freespace block that means that lookup
 	 * found an entry with our hash value.  This is the freespace
@@ -1746,62 +1753,38 @@ xfs_dir2_node_addname_int(
 			ASSERT(be16_to_cpu(bests[findex]) != NULLDATAOFF);
 			ASSERT(be16_to_cpu(bests[findex]) >= length);
 			dbno = freehdr.firstdb + findex;
-		} else {
-			/*
-			 * The data block looked at didn't have enough room.
-			 * We'll start at the beginning of the freespace entries.
-			 */
-			dbno = -1;
-			findex = 0;
+			goto block_found;
 		}
-	} else {
-		/*
-		 * Didn't come in with a freespace block, so no data block.
-		 */
-		ifbno = dbno = -1;
-		fbp = NULL;
-		findex = 0;
 	}
 
 	/*
-	 * If we don't have a data block yet, we're going to scan the
+	 * If we don't have a usable data block yet, we're going to scan the
 	 * freespace blocks looking for one.  Figure out what the
-	 * highest freespace block number is.
+	 * highest freespace block number is as we'll start scanning there.
 	 */
-	if (dbno == -1) {
-		xfs_fileoff_t	fo;		/* freespace block number */
+	error = xfs_bmap_last_offset(dp, &fo, XFS_DATA_FORK);
+	if (error)
+		return error;
+	lastfbno = xfs_dir2_da_to_db(args->geo, (xfs_dablk_t)fo);
+	firstfbno = xfs_dir2_byte_to_db(args->geo, XFS_DIR2_FREE_OFFSET);
 
-		if ((error = xfs_bmap_last_offset(dp, &fo, XFS_DATA_FORK)))
-			return error;
-		lastfbno = xfs_dir2_da_to_db(args->geo, (xfs_dablk_t)fo);
-		fbno = ifbno;
-	}
 	/*
 	 * While we haven't identified a data block, search the freeblock
-	 * data for a good data block.  If we find a null freeblock entry,
-	 * indicating a hole in the data blocks, remember that.
+	 * data for a good data block. Do a reverse order search, as growing
+	 * directories will put new blocks with free space at the end of the
+	 * free space index.
 	 */
-	while (dbno == -1) {
-		/*
-		 * If we don't have a freeblock in hand, get the next one.
-		 */
-		if (fbp == NULL) {
-			/*
-			 * Happens the first time through unless lookup gave
-			 * us a freespace block to start with.
-			 */
-			if (++fbno == 0)
-				fbno = xfs_dir2_byte_to_db(args->geo,
-							XFS_DIR2_FREE_OFFSET);
-			/*
-			 * If it's ifbno we already looked at it.
-			 */
+	for (fbno = lastfbno; fbno >= firstfbno; fbno--) {
+
+		/* If we don't have a freeblock in hand, get the previous one. */
+		if (!fbp) {
+			/* If it's ifbno we already looked at it. */
 			if (fbno == ifbno)
-				fbno++;
+				fbno--;
 			/*
 			 * If it's off the end we're done.
 			 */
-			if (fbno >= lastfbno)
+			if (fbno < firstfbno)
 				break;
 			/*
 			 * Read the block.  There can be holes in the
@@ -1817,8 +1800,8 @@ xfs_dir2_node_addname_int(
 			if (!fbp)
 				continue;
 			free = fbp->b_addr;
-			findex = 0;
 		}
+
 		/*
 		 * Look at the current free entry.  Is it good enough?
 		 *
@@ -1829,17 +1812,16 @@ xfs_dir2_node_addname_int(
 		 */
 		bests = dp->d_ops->free_bests_p(free);
 		dp->d_ops->free_hdr_from_disk(&freehdr, free);
-		do {
-
+		for (findex = freehdr.nvalid - 1; findex >= 0; findex--){
 			if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
 			    be16_to_cpu(bests[findex]) >= length) {
 				dbno = freehdr.firstdb + findex;
-				break;
+				goto block_found;
 			}
-		} while (++findex < freehdr.nvalid);
+		}
 
 		/* Drop the block if we done with the freeblock */
-		if (findex == freehdr.nvalid) {
+		if (findex < 0) {
 			xfs_trans_brelse(tp, fbp);
 			fbp = NULL;
 			if (fblk)
@@ -1847,11 +1829,13 @@ xfs_dir2_node_addname_int(
 		}
 	}
 
+	ASSERT(dbno == -1);
+
 	/*
-	 * If we don't have a data block, we need to allocate one and make
+	 * We don't have a data block so we need to allocate one and make
 	 * the freespace entries refer to it.
 	 */
-	if (unlikely(dbno == -1)) {
+	if (dbno == -1) {
 		/*
 		 * Not allowed to allocate, return failure.
 		 */
@@ -1981,6 +1965,7 @@ xfs_dir2_node_addname_int(
 	 * We had a data block so we don't have to make a new one.
 	 */
 	else {
+block_found:
 		/*
 		 * If just checking, we succeeded.
 		 */
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux