Re: [PATCH] xfs: optimise CIL insertion during transaction commit [RFC]

Mark Tinguely <tinguely@xxxxxxx> · Wed, 03 Jul 2013 21:09:10 -0500

On 07/01/13 00:44, Dave Chinner wrote:
From: Dave Chinner<dchinner@xxxxxxxxxx>

Note: This is an RFC right now - it'll need to be broken up into
several patches for final submission.

The CIL insertion during transaction commit currently does multiple
passes across the transaction objects and requires multiple memory
allocations per object that is to be inserted into the CIL. It is
quite inefficient, and as such xfs_log_commit_cil() and it's
children show up quite highly in profiles under metadata
modification intensive workloads.

The current insertion tries to minimise ithe number of times the
xc_cil_lock is grabbed and the hold times via a couple of methods:

	1. an initial loop across the transaction items outside the
	lock to allocate log vectors, buffers and copy the data into
	them.
	2. a second pass across the log vectors that then inserts
	them into the CIL, modifies the CIL state and frees the old
	vectors.

This is somewhat inefficient. While it minimises lock grabs, the
hold time is still quite high because we are freeing objects with
the spinlock held and so the hold times are much higher than they
need to be.

Optimisations that can be made:

	1. change IOP_SIZE() to return the size of the metadata to
	be logged with the number of vectors required. This means we
	can allocate the buffer to hold the metadata as part of the
	allocation for the log vector array. Once allocation instead
	of two.

	2. Store the size of the allocated region for the log vector
	in the log vector itself. This allows us to determine if we
	can just reuse the existing log vector for the new
	transactions. Avoids all memory allocation for that item.

	3. when setting up a new log vector, we don't need to
	hold the CIL lock while determining the difference in it's
	size when compared to the old one. Hence we can do all the
	accounting, swapping of the log vectors and freeing the old
	log vectors outside the xc_cil_lock.

	4. We can do all the CIL insertation of items and busy
	extents and the accounting under a single grab of the
	xc_cil_lock. This minimises the number of times we need to
	grab it and hence contention points are kept to a minimum

	5. the xc_cil_lock is used for serialising both the CIL and
	the push/commit ordering. Separate the two functions into
	different locks so push/commit ordering does not affect CIL
	insertion

	6. the xc_cil_lock shares cachelines with other locks.
	Separate them onto different cachelines.

The result is that my standard fsmark benchmark (8-way, 50m files)
on my standard test VM (8-way, 4GB RAM, 4xSSD in RAID0, 100TB fs)
gives the following results with a xfs-oss tree. No CRCs:

                 vanilla         patched         Difference
create  (time)  483s            435s            -10.0%  (faster)
         (rate)  109k+/6k        122k+/-7k       +11.9%  (faster)

walk            339s            335s            (noise)
      (sys cpu)  1134s           1135s           (noise)

unlink          692s            645s             -6.8%  (faster)

So it's significantly faster than the current code, and lock_stat
reports lower contention on the xc_cil_lock, too. So, big win here.

With CRCs:

                 vanilla         patched         Difference
create  (time)  510s            460s             -9.8%  (faster)
         (rate)  105k+/5.4k      117k+/-5k       +11.4%  (faster)

walk            494s            486s            (noise)
      (sys cpu)  1324s           1290s           (noise)

unlink          959s            889s             -7.3%  (faster)

Gains are of the same order, with walk and unlink still affected by
VFS LRU lock contention. IOWs, with this changes, filesystems with
CRCs enabled will still be faster than the old non-CRC kernels...

Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>

Dave,

The idea to separate the transaction commits from the CIL push looks good.

I am still concerned that the xc_ctx->space_used comparison to
XLOG_CIL_SPACE_LIMIT(log) does not also contain any information on
the "stolen" ticket log space. Maybe the future Liu/Chinner minimum
log space series is the best place to address this and my other log
space concerns.

--Mark.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs