[RFC PATCH v2 14/16] bcache: try to flush btree nodes as many as possible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When btree_flush_write() is called, it means the journal space is
exhuasted already. Current code only selects a single btree node to
write out, which may introduce huge cache bounce from the spinlock on
multiple cpu cores, when a lot of kworkers on journaling code path to
call btree_flush_write() for journal space reclaiming.

This patch tries to flush as many btree node as possible inside
a single call to btree_flush_write(), then the frequence of calling
btree_flush_write() can be reduced, which in turn reduces the cache
bounce from spinlock on multiple cpu cores. Please notice that this
patch does not reduce the total times of acquiring spinlock, a spin
lock is still acquired when select every single btree node to write
out, but this patch will try best to hold the spinlock on same cpu
core, which avoids the cache bounce where the spinlock is acquired by
multiple different cpu cores.

After the patch applied, in my pressure testing, 'top' shows more than
50% sys cpu time reduced from the kworks which competing spinlock
inside btree_flush_write().

Signed-off-by: Coly Li <colyli@xxxxxxx>
---
 drivers/md/bcache/journal.c | 7 ++++++-
 drivers/md/bcache/journal.h | 4 ++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index bc0e01151155..8536e76fcac9 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -514,6 +514,7 @@ static void btree_flush_write(struct cache_set *c)
 	 */
 	struct btree *b;
 	int i;
+	int n = FLUSH_BTREE_HEAP;
 
 	atomic_long_inc(&c->flush_write);
 
@@ -552,6 +553,10 @@ static void btree_flush_write(struct cache_set *c)
 
 		__bch_btree_node_write(b, NULL);
 		mutex_unlock(&b->write_lock);
+
+		/* try to flush btree nodes as many as possible */
+		if (--n > 0)
+			goto retry;
 	}
 }
 
@@ -1102,7 +1107,7 @@ int bch_journal_alloc(struct cache_set *c)
 	j->w[0].c = c;
 	j->w[1].c = c;
 
-	if (!(init_heap(&c->flush_btree, 128, GFP_KERNEL)) ||
+	if (!(init_heap(&c->flush_btree, FLUSH_BTREE_HEAP, GFP_KERNEL)) ||
 	    !(init_fifo(&j->pin, JOURNAL_PIN, GFP_KERNEL)) ||
 	    !(j->w[0].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)) ||
 	    !(j->w[1].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)))
diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h
index 55f81443f304..a8be14c6f6d9 100644
--- a/drivers/md/bcache/journal.h
+++ b/drivers/md/bcache/journal.h
@@ -158,8 +158,8 @@ struct journal_device {
 #define journal_pin_cmp(c, l, r)				\
 	(fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r)))
 
-#define JOURNAL_PIN	20000
-
+#define FLUSH_BTREE_HEAP	128
+#define JOURNAL_PIN		20000
 /* Reserved jouranl space in sectors */
 #define BCH_JOURNAL_RPLY_RESERVE	6U
 #define BCH_JOURNAL_RESERVE		7U
-- 
2.16.4




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux