Re: bcache causes RCU stalls/bcache_gc hogs CPU

Nikolay Borisov <n.borisov@xxxxxxxxxxxxxx> · Wed, 15 Apr 2015 11:41:31 +0300

Thanks for the patches, I've applied them to 4.0 and am in the process 
of testing that.

Do you happen to know in which (if any) repo do those patches live and 
if there is a way to "reliably" (e.g. a repo where they are being 
applied) track them or have you just collected them from misc postings 
to the mailing list?

Regards,
Nikolay

On 04/14/2015 11:03 PM, Eric Wheeler wrote:
Apply all of the attached patches to your kernel and try again.

I wish somebody would apply these upstream and get it into the official
kernel. I have been carrying all of these patches with me for some time
and they definitely make bcache more stable.

-Eric

--
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

On Tue, 14 Apr 2015, Nikolay Borisov wrote:

Hello list,

I'm currently testing bcache with the following setup:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda                   8:0    0   1.8T  0 disk
??sda1                8:1    0     2M  0 part
??sda2                8:2    0   191M  0 part
??sda3                8:3    0   1.8T  0 part
   ??main-os (dm-0)  254:0    0   1.8T  0 lvm  /
sdb                   8:16   0 223.1G  0 disk
??sdb1                8:17   0 188.2M  0 part /boot
??sdb2                8:18   0 222.9G  0 part
   ??main-ssd (dm-1) 254:1    0    40G  0 lvm
   ? ??bcache0       253:0    0   182G  0 disk /sdb
   ??main-db (dm-2)  254:2    0   182G  0 lvm
     ??bcache0       253:0    0   182G  0 disk /sdb

So a 40gig ssd (main-ssd, lvm2 volume) backed by a 180gig hdd (main-db, lvm 2
volume), using the writeback cache policy. Every other setting is at its
default. I'm running the 4.0-rc6 (!CONFIG_PREEMPT). After running fio (using a
30gb file) with a mix of sequential and random i/o and I'm getting the
following RCU warn:

======================================================
INFO: rcu_sched self-detected stall on CPU
         4: (2099 ticks this GP) idle=fcf/140000000000001/0
softirq=1031582/1031582 fqs=2100
INFO: rcu_sched detected stalls on CPUs/tasks:
         4: (2099 ticks this GP) idle=fcf/140000000000001/0
softirq=1031582/1031582 fqs=2100
         (detected by 16, t=2104 jiffies, g=2176431, c=2176430, q=3098)
Task dump for CPU 4:
bcache_gc	R  running task    12728 18115      2 0x00000008
  ffff880079e85720 fffffffffffffffc ffff88046c180e20 fffffffffffffffc
  ffffffff81091693 fffffffffffffffc ffff88086aa3d000 ffff88046c180000
  ffff88086aa3d000 ffff88046c180000 ffff88086aa3d060 ffff88086aa3d000
Call Trace:
  [<ffffffff81091693>] ? __wake_up+0x53/0x70
  [<ffffffffa01103d4>] ? bch_btree_gc+0x2f4/0x560 [bcache]
  [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0
  [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20
  [<ffffffffa0110678>] ? bch_gc_thread+0x38/0x120 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffff81070a9e>] ? kthread+0xce/0xf0
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff815b8818>] ? ret_from_fork+0x58/0x90
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
          (t=2228 jiffies g=2176431 c=2176430 q=3161)
Task dump for CPU 4:
bcache_gc	R  running task    12728 18115      2 0x00000008
  0000000000000005 ffff88046fc83ca8 ffffffff8107720b 0000000000000004
  ffffffff8183d040 ffff88046fc83cc8 ffffffff810772af ffff88046fc83cc8
  ffffffff8183d100 ffff88046fc83cf8 ffffffff810a5101 ffff88046fc94500
Call Trace:
  <IRQ>  [<ffffffff8107720b>] sched_show_task+0xcb/0x130
  [<ffffffff810772af>] dump_cpu_task+0x3f/0x50
  [<ffffffff810a5101>] rcu_dump_cpu_stacks+0x91/0xd0
  [<ffffffff810a68cf>] rcu_check_callbacks+0x65f/0xc30
  [<ffffffff81080ecc>] ? account_process_tick+0x6c/0x170
  [<ffffffff810acf29>] update_process_times+0x39/0x70
  [<ffffffff810beba0>] tick_sched_handle+0x40/0x50
  [<ffffffff810bedb2>] tick_sched_timer+0x52/0xa0
  [<ffffffff810afa16>] __run_hrtimer+0x86/0x1d0
  [<ffffffff810bed60>] ? tick_nohz_handler+0xc0/0xc0
  [<ffffffff810afd92>] hrtimer_interrupt+0x102/0x240
  [<ffffffffa0109920>] ? bch_ptr_invalid+0x10/0x10 [bcache]
  [<ffffffff81032e79>] local_apic_timer_interrupt+0x39/0x60
  [<ffffffff815bb355>] smp_apic_timer_interrupt+0x45/0x59
  [<ffffffffa0109920>] ? bch_ptr_invalid+0x10/0x10 [bcache]
  [<ffffffff815b972d>] apic_timer_interrupt+0x6d/0x80
  <EOI>  [<ffffffffa01117c5>] ? __bch_extent_invalid+0xa5/0xd0 [bcache]
  [<ffffffffa0111721>] ? __bch_extent_invalid+0x1/0xd0 [bcache]
  [<ffffffffa0111802>] ? bch_extent_invalid+0x12/0x20 [bcache]
  [<ffffffffa011183d>] bch_extent_bad+0x2d/0x1c0 [bcache]
  [<ffffffffa010992a>] bch_ptr_bad+0xa/0x10 [bcache]
  [<ffffffffa01098f9>] bch_btree_iter_next_filter+0x39/0x50 [bcache]
  [<ffffffffa0109c80>] btree_gc_count_keys+0x50/0x70 [bcache]
  [<ffffffffa010ffbf>] btree_gc_recurse+0x1bf/0x2e0 [bcache]
  [<ffffffffa010c4ac>] ? btree_gc_mark_node+0xdc/0x210 [bcache]
  [<ffffffff81091693>] ? __wake_up+0x53/0x70
  [<ffffffffa01103d4>] bch_btree_gc+0x2f4/0x560 [bcache]
  [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0
  [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20
  [<ffffffffa0110678>] bch_gc_thread+0x38/0x120 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffff81070a9e>] kthread+0xce/0xf0
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff815b8818>] ret_from_fork+0x58/0x90

Naturally, checking
/sys/fs/bcache/b9bcddd1-7a9a-4f2f-88e6-cb5bef6abcf2/internal/btree_gc_max_duration_ms
shows: 31593  Clearly at some point the GC overhead becomes so large that it
causes RCU grace period stalls. I'm puzzled since bch_btree_gc_finish(...) is
not listed and this is the only function that pertains to bcache gc AND
executes code in RCU critical read section.

In addition to that I also observed that the after this RCU stall warn occurs
the bcache_gc thread hogs the machine at 100% rendering it unusable. I managed
to get 2 call stack dumps via magic sysrq as follows:

=============
NMI backtrace for cpu 4
CPU: 4 PID: 18115 Comm: bcache_gc Not tainted 4.0.0-rc6bcache1-nikbor #4
Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013
task: ffff88086ab093e0 ti: ffff880868024000 task.ti: ffff880868024000
RIP: 0010:[<ffffffffa01098cb>]  [<ffffffffa01098cb>]
bch_btree_iter_next_filter+0xb/0x50 [bcache]
RSP: 0018:ffff880868027bd8  EFLAGS: 00000202
RAX: 0000000000000001 RBX: 0000000000002034 RCX: 000000000000000a
RDX: ffffffffa0109920 RSI: ffff88086aa3dcd0 RDI: ffff880868027c08
RBP: ffff880868027bf8 R08: 0000000000000001 R09: 0000000000000001
R10: 000007ffffffffff R11: 0000000000000008 R12: ffff88086aa3dcd0
R13: ffff880868027c08 R14: ffff880868027cf8 R15: ffff880868027dd8
FS:  0000000000000000(0000) GS:ffff88046fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f6b410008 CR3: 000000000180e000 CR4: 00000000001406e0
Stack:
  0000000000002034 ffff88086aa3dcd0 ffff880868027c08 ffff880868027cf8
  ffff880868027c78 ffffffffa0109c80 0000000000000004 0000000000000001
  ffff8807740101c0 ffff88077402a9f8 ffff88086aa3dc00 ffff88086aa3d000
Call Trace:
  [<ffffffffa0109c80>] btree_gc_count_keys+0x50/0x70 [bcache]
  [<ffffffffa010ffbf>] btree_gc_recurse+0x1bf/0x2e0 [bcache]
  [<ffffffffa010c4ac>] ? btree_gc_mark_node+0xdc/0x210 [bcache]
  [<ffffffff81091693>] ? __wake_up+0x53/0x70
  [<ffffffffa01103d4>] bch_btree_gc+0x2f4/0x560 [bcache]
  [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0
  [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20
  [<ffffffffa0110678>] bch_gc_thread+0x38/0x120 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffff81070a9e>] kthread+0xce/0xf0
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff815b8818>] ret_from_fork+0x58/0x90
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
Code: ff 48 89 d7 4c 29 cf eb a7 48 29 f2 48 89 d6 e9 18 ff ff ff 66 66 66 2e
0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 41 54 53 <0f> 1f 44 00 00 48
89 fb 49 89 f4 49 89 d6 0f 1f 80 00 00 00 00

===========================================

NMI backtrace for cpu 4
CPU: 4 PID: 18115 Comm: bcache_gc Not tainted 4.0.0-rc6bcache1-nikbor #4
Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013
task: ffff88086ab093e0 ti: ffff880868024000 task.ti: ffff880868024000
RIP: 0010:[<ffffffffa0111916>]  [<ffffffffa0111916>]
bch_extent_bad+0x106/0x1c0 [bcache]
RSP: 0018:ffff880868027ba8  EFLAGS: 00000202
RAX: 000000000000bd2a RBX: ffff88077400a550 RCX: 000000000000000a
RDX: 0000000000000004 RSI: 00000000fc390004 RDI: ffff88046c180000
RBP: ffff880868027bb8 R08: 0000000000000001 R09: 0000000000000000
R10: 000007ffffffffff R11: 0000000000000008 R12: ffff88086aa3dcd0
R13: ffff88077400a550 R14: ffffffffa0109920 R15: ffff880868027dd8
FS:  0000000000000000(0000) GS:ffff88046fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f6b410008 CR3: 000000000180e000 CR4: 00000000001406e0
Stack:
  ffff880868027c08 ffff88086aa3dcd0 ffff880868027bc8 ffffffffa010992a
  ffff880868027bf8 ffffffffa01098f9 00000000000014a6 ffff88086aa3dcd0
  ffff880868027c08 ffff880868027cf8 ffff880868027c78 ffffffffa0109c80
Call Trace:
  [<ffffffffa010992a>] bch_ptr_bad+0xa/0x10 [bcache]
  [<ffffffffa01098f9>] bch_btree_iter_next_filter+0x39/0x50 [bcache]
  [<ffffffffa0109c80>] btree_gc_count_keys+0x50/0x70 [bcache]
  [<ffffffffa010ffbf>] btree_gc_recurse+0x1bf/0x2e0 [bcache]
  [<ffffffffa010c4ac>] ? btree_gc_mark_node+0xdc/0x210 [bcache]
  [<ffffffff81091693>] ? __wake_up+0x53/0x70
  [<ffffffffa01103d4>] bch_btree_gc+0x2f4/0x560 [bcache]
  [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0
  [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20
  [<ffffffffa0110678>] bch_gc_thread+0x38/0x120 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache]
  [<ffffffff81070a9e>] kthread+0xce/0xf0
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff815b8818>] ret_from_fork+0x58/0x90
  [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70
Code: 33 25 ff 0f 00 00 48 8b 94 c7 40 0c 00 00 48 89 f0 48 8b 92 d8 0a 00 00
48 c1 e8 08 4c 21 d0 48 d3 e8 48 8d 04 40 0f b6 54 82 06 <40> 28 f2 80 fa 80
0f 87 7e 00 00 00 0f b6 d2 83 fa 60 76 66 31

In the mean time I'm running the stable 4.0.0 where I observe better results (
no bcache_gc thread hog but still the occasional stall warn)

Regards,
Nikolay
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html