Hello list, I'm currently testing bcache with the following setup: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 2M 0 part ├─sda2 8:2 0 191M 0 part └─sda3 8:3 0 1.8T 0 part └─main-os (dm-0) 254:0 0 1.8T 0 lvm / sdb 8:16 0 223.1G 0 disk ├─sdb1 8:17 0 188.2M 0 part /boot └─sdb2 8:18 0 222.9G 0 part ├─main-ssd (dm-1) 254:1 0 40G 0 lvm │ └─bcache0 253:0 0 182G 0 disk /sdb └─main-db (dm-2) 254:2 0 182G 0 lvm └─bcache0 253:0 0 182G 0 disk /sdb So a 40gig ssd (main-ssd, lvm2 volume) backed by a 180gig hdd (main-db, lvm 2 volume), using the writeback cache policy. Every other setting is at its default. I'm running the 4.0-rc6 (!CONFIG_PREEMPT). After running fio (using a 30gb file) with a mix of sequential and random i/o and I'm getting the following RCU warn: ====================================================== INFO: rcu_sched self-detected stall on CPU 4: (2099 ticks this GP) idle=fcf/140000000000001/0 softirq=1031582/1031582 fqs=2100 INFO: rcu_sched detected stalls on CPUs/tasks: 4: (2099 ticks this GP) idle=fcf/140000000000001/0 softirq=1031582/1031582 fqs=2100 (detected by 16, t=2104 jiffies, g=2176431, c=2176430, q=3098) Task dump for CPU 4: bcache_gc R running task 12728 18115 2 0x00000008 ffff880079e85720 fffffffffffffffc ffff88046c180e20 fffffffffffffffc ffffffff81091693 fffffffffffffffc ffff88086aa3d000 ffff88046c180000 ffff88086aa3d000 ffff88046c180000 ffff88086aa3d060 ffff88086aa3d000 Call Trace: [<ffffffff81091693>] ? __wake_up+0x53/0x70 [<ffffffffa01103d4>] ? bch_btree_gc+0x2f4/0x560 [bcache] [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0 [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20 [<ffffffffa0110678>] ? bch_gc_thread+0x38/0x120 [bcache] [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache] [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache] [<ffffffff81070a9e>] ? kthread+0xce/0xf0 [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff815b8818>] ? ret_from_fork+0x58/0x90 [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70 (t=2228 jiffies g=2176431 c=2176430 q=3161) Task dump for CPU 4: bcache_gc R running task 12728 18115 2 0x00000008 0000000000000005 ffff88046fc83ca8 ffffffff8107720b 0000000000000004 ffffffff8183d040 ffff88046fc83cc8 ffffffff810772af ffff88046fc83cc8 ffffffff8183d100 ffff88046fc83cf8 ffffffff810a5101 ffff88046fc94500 Call Trace: <IRQ> [<ffffffff8107720b>] sched_show_task+0xcb/0x130 [<ffffffff810772af>] dump_cpu_task+0x3f/0x50 [<ffffffff810a5101>] rcu_dump_cpu_stacks+0x91/0xd0 [<ffffffff810a68cf>] rcu_check_callbacks+0x65f/0xc30 [<ffffffff81080ecc>] ? account_process_tick+0x6c/0x170 [<ffffffff810acf29>] update_process_times+0x39/0x70 [<ffffffff810beba0>] tick_sched_handle+0x40/0x50 [<ffffffff810bedb2>] tick_sched_timer+0x52/0xa0 [<ffffffff810afa16>] __run_hrtimer+0x86/0x1d0 [<ffffffff810bed60>] ? tick_nohz_handler+0xc0/0xc0 [<ffffffff810afd92>] hrtimer_interrupt+0x102/0x240 [<ffffffffa0109920>] ? bch_ptr_invalid+0x10/0x10 [bcache] [<ffffffff81032e79>] local_apic_timer_interrupt+0x39/0x60 [<ffffffff815bb355>] smp_apic_timer_interrupt+0x45/0x59 [<ffffffffa0109920>] ? bch_ptr_invalid+0x10/0x10 [bcache] [<ffffffff815b972d>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffffa01117c5>] ? __bch_extent_invalid+0xa5/0xd0 [bcache] [<ffffffffa0111721>] ? __bch_extent_invalid+0x1/0xd0 [bcache] [<ffffffffa0111802>] ? bch_extent_invalid+0x12/0x20 [bcache] [<ffffffffa011183d>] bch_extent_bad+0x2d/0x1c0 [bcache] [<ffffffffa010992a>] bch_ptr_bad+0xa/0x10 [bcache] [<ffffffffa01098f9>] bch_btree_iter_next_filter+0x39/0x50 [bcache] [<ffffffffa0109c80>] btree_gc_count_keys+0x50/0x70 [bcache] [<ffffffffa010ffbf>] btree_gc_recurse+0x1bf/0x2e0 [bcache] [<ffffffffa010c4ac>] ? btree_gc_mark_node+0xdc/0x210 [bcache] [<ffffffff81091693>] ? __wake_up+0x53/0x70 [<ffffffffa01103d4>] bch_btree_gc+0x2f4/0x560 [bcache] [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0 [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20 [<ffffffffa0110678>] bch_gc_thread+0x38/0x120 [bcache] [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache] [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache] [<ffffffff81070a9e>] kthread+0xce/0xf0 [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff815b8818>] ret_from_fork+0x58/0x90 Naturally, checking /sys/fs/bcache/b9bcddd1-7a9a-4f2f-88e6-cb5bef6abcf2/internal/btree_gc_max_duration_ms shows: 31593 Clearly at some point the GC overhead becomes so large that it causes RCU grace period stalls. I'm puzzled since bch_btree_gc_finish(...) is not listed and this is the only function that pertains to bcache gc AND executes code in RCU critical read section. In addition to that I also observed that the after this RCU stall warn occurs the bcache_gc thread hogs the machine at 100% rendering it unusable. I managed to get 2 call stack dumps via magic sysrq as follows: ============= NMI backtrace for cpu 4 CPU: 4 PID: 18115 Comm: bcache_gc Not tainted 4.0.0-rc6bcache1-nikbor #4 Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013 task: ffff88086ab093e0 ti: ffff880868024000 task.ti: ffff880868024000 RIP: 0010:[<ffffffffa01098cb>] [<ffffffffa01098cb>] bch_btree_iter_next_filter+0xb/0x50 [bcache] RSP: 0018:ffff880868027bd8 EFLAGS: 00000202 RAX: 0000000000000001 RBX: 0000000000002034 RCX: 000000000000000a RDX: ffffffffa0109920 RSI: ffff88086aa3dcd0 RDI: ffff880868027c08 RBP: ffff880868027bf8 R08: 0000000000000001 R09: 0000000000000001 R10: 000007ffffffffff R11: 0000000000000008 R12: ffff88086aa3dcd0 R13: ffff880868027c08 R14: ffff880868027cf8 R15: ffff880868027dd8 FS: 0000000000000000(0000) GS:ffff88046fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4f6b410008 CR3: 000000000180e000 CR4: 00000000001406e0 Stack: 0000000000002034 ffff88086aa3dcd0 ffff880868027c08 ffff880868027cf8 ffff880868027c78 ffffffffa0109c80 0000000000000004 0000000000000001 ffff8807740101c0 ffff88077402a9f8 ffff88086aa3dc00 ffff88086aa3d000 Call Trace: [<ffffffffa0109c80>] btree_gc_count_keys+0x50/0x70 [bcache] [<ffffffffa010ffbf>] btree_gc_recurse+0x1bf/0x2e0 [bcache] [<ffffffffa010c4ac>] ? btree_gc_mark_node+0xdc/0x210 [bcache] [<ffffffff81091693>] ? __wake_up+0x53/0x70 [<ffffffffa01103d4>] bch_btree_gc+0x2f4/0x560 [bcache] [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0 [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20 [<ffffffffa0110678>] bch_gc_thread+0x38/0x120 [bcache] [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache] [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache] [<ffffffff81070a9e>] kthread+0xce/0xf0 [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff815b8818>] ret_from_fork+0x58/0x90 [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70 Code: ff 48 89 d7 4c 29 cf eb a7 48 29 f2 48 89 d6 e9 18 ff ff ff 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 41 54 53 <0f> 1f 44 00 00 48 89 fb 49 89 f4 49 89 d6 0f 1f 80 00 00 00 00 =========================================== NMI backtrace for cpu 4 CPU: 4 PID: 18115 Comm: bcache_gc Not tainted 4.0.0-rc6bcache1-nikbor #4 Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.0b 12/05/2013 task: ffff88086ab093e0 ti: ffff880868024000 task.ti: ffff880868024000 RIP: 0010:[<ffffffffa0111916>] [<ffffffffa0111916>] bch_extent_bad+0x106/0x1c0 [bcache] RSP: 0018:ffff880868027ba8 EFLAGS: 00000202 RAX: 000000000000bd2a RBX: ffff88077400a550 RCX: 000000000000000a RDX: 0000000000000004 RSI: 00000000fc390004 RDI: ffff88046c180000 RBP: ffff880868027bb8 R08: 0000000000000001 R09: 0000000000000000 R10: 000007ffffffffff R11: 0000000000000008 R12: ffff88086aa3dcd0 R13: ffff88077400a550 R14: ffffffffa0109920 R15: ffff880868027dd8 FS: 0000000000000000(0000) GS:ffff88046fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4f6b410008 CR3: 000000000180e000 CR4: 00000000001406e0 Stack: ffff880868027c08 ffff88086aa3dcd0 ffff880868027bc8 ffffffffa010992a ffff880868027bf8 ffffffffa01098f9 00000000000014a6 ffff88086aa3dcd0 ffff880868027c08 ffff880868027cf8 ffff880868027c78 ffffffffa0109c80 Call Trace: [<ffffffffa010992a>] bch_ptr_bad+0xa/0x10 [bcache] [<ffffffffa01098f9>] bch_btree_iter_next_filter+0x39/0x50 [bcache] [<ffffffffa0109c80>] btree_gc_count_keys+0x50/0x70 [bcache] [<ffffffffa010ffbf>] btree_gc_recurse+0x1bf/0x2e0 [bcache] [<ffffffffa010c4ac>] ? btree_gc_mark_node+0xdc/0x210 [bcache] [<ffffffff81091693>] ? __wake_up+0x53/0x70 [<ffffffffa01103d4>] bch_btree_gc+0x2f4/0x560 [bcache] [<ffffffff8100180b>] ? __switch_to+0xbb/0x5f0 [<ffffffff810911f0>] ? woken_wake_function+0x20/0x20 [<ffffffffa0110678>] bch_gc_thread+0x38/0x120 [bcache] [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache] [<ffffffffa0110640>] ? bch_btree_gc+0x560/0x560 [bcache] [<ffffffff81070a9e>] kthread+0xce/0xf0 [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff815b8818>] ret_from_fork+0x58/0x90 [<ffffffff810709d0>] ? kthread_freezable_should_stop+0x70/0x70 Code: 33 25 ff 0f 00 00 48 8b 94 c7 40 0c 00 00 48 89 f0 48 8b 92 d8 0a 00 00 48 c1 e8 08 4c 21 d0 48 d3 e8 48 8d 04 40 0f b6 54 82 06 <40> 28 f2 80 fa 80 0f 87 7e 00 00 00 0f b6 d2 83 fa 60 76 66 31 In the mean time I'm running the stable 4.0.0 where I observe better results ( no bcache_gc thread hog but still the occasional stall warn) Regards, Nikolay -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html