When btree_flush_write() is called, it means the journal space is exhuasted already. Current code only selects a single btree node to write out, which may introduce huge cache bounce from the spinlock on multiple cpu cores, when a lot of kworkers on journaling code path to call btree_flush_write() for journal space reclaiming. This patch tries to flush as many btree node as possible inside a single call to btree_flush_write(), then the frequence of calling btree_flush_write() can be reduced, which in turn reduces the cache bounce from spinlock on multiple cpu cores. Please notice that this patch does not reduce the total times of acquiring spinlock, a spin lock is still acquired when select every single btree node to write out, but this patch will try best to hold the spinlock on same cpu core, which avoids the cache bounce where the spinlock is acquired by multiple different cpu cores. After the patch applied, in my pressure testing, 'top' shows more than 50% sys cpu time reduced from the kworks which competing spinlock inside btree_flush_write(). Signed-off-by: Coly Li <colyli@xxxxxxx> --- drivers/md/bcache/journal.c | 7 ++++++- drivers/md/bcache/journal.h | 4 ++-- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index bc0e01151155..8536e76fcac9 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -514,6 +514,7 @@ static void btree_flush_write(struct cache_set *c) */ struct btree *b; int i; + int n = FLUSH_BTREE_HEAP; atomic_long_inc(&c->flush_write); @@ -552,6 +553,10 @@ static void btree_flush_write(struct cache_set *c) __bch_btree_node_write(b, NULL); mutex_unlock(&b->write_lock); + + /* try to flush btree nodes as many as possible */ + if (--n > 0) + goto retry; } } @@ -1102,7 +1107,7 @@ int bch_journal_alloc(struct cache_set *c) j->w[0].c = c; j->w[1].c = c; - if (!(init_heap(&c->flush_btree, 128, GFP_KERNEL)) || + if (!(init_heap(&c->flush_btree, FLUSH_BTREE_HEAP, GFP_KERNEL)) || !(init_fifo(&j->pin, JOURNAL_PIN, GFP_KERNEL)) || !(j->w[0].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS)) || !(j->w[1].data = (void *) __get_free_pages(GFP_KERNEL, JSET_BITS))) diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h index 55f81443f304..a8be14c6f6d9 100644 --- a/drivers/md/bcache/journal.h +++ b/drivers/md/bcache/journal.h @@ -158,8 +158,8 @@ struct journal_device { #define journal_pin_cmp(c, l, r) \ (fifo_idx(&(c)->journal.pin, (l)) > fifo_idx(&(c)->journal.pin, (r))) -#define JOURNAL_PIN 20000 - +#define FLUSH_BTREE_HEAP 128 +#define JOURNAL_PIN 20000 /* Reserved jouranl space in sectors */ #define BCH_JOURNAL_RPLY_RESERVE 6U #define BCH_JOURNAL_RESERVE 7U -- 2.16.4