On 2018/7/9 11:51 PM, Cameron Berkenpas wrote: > Hello, > > Thank you for the fast response! Sorry if this message is too verbose... > > Yes, ppc64le is just PPC in little endian mode, correct. > > How I'm creating the devices for bcache: > make-bcache -B /dev/sdd1 > > (I've tried removing the writeback and discard options too) > make-bcache -C --writeback --discard /dev/sdc1 > > When attaching the caching device for the *first* time (I suspect this > is normal): > [ 193.275300] bcache: register_bdev() registered backing device sdd1 > [ 193.292590] bcache: run_cache_set() invalidating existing data > [ 223.527043] bcache: register_cache() registered cache device sdc1 > [ 223.534950] bcache: bch_cached_dev_attach() Caching sdd1 as bcache0 > on set 6aa362b3-606e-4c51-9bc7-807b8a6a8442 > > Detaching caching device: > [ 325.293675] bcache: cached_dev_detach_finish() Caching disabled for sdd1 > > And finally when I attempt to re-attach, things hang. No messages. > > Here's the trace from 'echo l > /proc/sysrq-trigger': > [ 526.384192] sysrq: SysRq : Show backtrace of all active CPUs > [ 526.384673] sysrq: CPU20: > [ 526.384710] Call Trace: > [ 526.384742] [c000001e5085f930] [c000000000778ce0] showacpu+0x80/0xa0 > (unreliable) > [ 526.384841] [c000001e5085f9a0] [c0000000001d6dd8] > flush_smp_call_function_queue+0x128/0x1d0 > [ 526.384958] [c000001e5085fa20] [c00000000004d89c] > smp_ipi_demux_relaxed+0x9c/0x110 > [ 526.385075] [c000001e5085fa60] [c000000000048750] > doorbell_exception+0xb0/0xf0 > [ 526.385173] [c000001e5085faa0] [c000000000009fa8] > h_doorbell_common+0x158/0x160 > [ 526.385282] --- interrupt: e81 at replay_interrupt_return+0x0/0x4 > LR = arch_local_irq_restore+0x74/0x90 > [ 526.385420] [c000001e5085fd90] [0000000000000014] 0x14 (unreliable) > [ 526.385511] [c000001e5085fdb0] [c0000000009f82d0] > cpuidle_enter_state+0xf0/0x400 > [ 526.385618] [c000001e5085fe10] [c000000000154df0] call_cpuidle+0x70/0xd0 > [ 526.385713] [c000001e5085fe50] [c00000000015554c] do_idle+0x31c/0x3a0 > [ 526.385798] [c000001e5085fec0] [c00000000015582c] > cpu_startup_entry+0x3c/0x50 > [ 526.385892] [c000001e5085fef0] [c00000000004ec6c] > start_secondary+0x4fc/0x540 > [ 526.385992] [c000001e5085ff90] [c00000000000b270] > start_secondary_prolog+0x10/0x14 > > Now that the re-attach has hung for a while, I have the following: > [ 605.232666] INFO: task bash:2134 blocked for more than 120 seconds. > [ 605.232715] Not tainted 4.17.5 #7 > [ 605.232746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 605.232807] bash D 0 2134 2133 0x00040000 > [ 605.232846] Call Trace: > [ 605.232883] [c000001c0ce8f850] [c00000000001e6ec] > __switch_to+0x30c/0x4d0 > [ 605.232957] [c000001c0ce8f8b0] [c000000000c08470] __schedule+0x330/0xa90 > [ 605.233030] [c000001c0ce8f980] [c000000000c08c10] schedule+0x40/0xc0 > [ 605.233113] [c000001c0ce8f9a0] [c000000000c0d708] > rwsem_down_write_failed+0x198/0x390 > [ 605.233220] [c000001c0ce8fa50] [c000000000c0c588] down_write+0x78/0xa0 > [ 605.233304] [c000001c0ce8fa80] [c0000000009d79fc] > bch_cached_dev_attach+0x35c/0x5c0 > [ 605.233394] [c000001c0ce8fb50] [c0000000009db7b0] > __cached_dev_store+0x820/0x8c0 > [ 605.233466] [c000001c0ce8fc00] [c0000000009db8b4] > bch_cached_dev_store+0x64/0x1a0 > [ 605.233529] [c000001c0ce8fc50] [c000000000490bcc] > sysfs_kf_write+0x7c/0xc0 > [ 605.233585] [c000001c0ce8fc90] [c00000000048f79c] > kernfs_fop_write+0x18c/0x250 > [ 605.233693] [c000001c0ce8fce0] [c0000000003c144c] __vfs_write+0x6c/0x1d0 > [ 605.233776] [c000001c0ce8fd80] [c0000000003c1808] vfs_write+0xd8/0x240 > [ 605.233860] [c000001c0ce8fdd0] [c0000000003c1bd0] ksys_write+0x70/0x120 > [ 605.233935] [c000001c0ce8fe30] [c00000000000b9e0] system_call+0x58/0x6c > > And finally, here's the stack for that bash process: > [<0>] (null) > [<0>] __switch_to+0x30c/0x4d0 > [<0>] bch_cached_dev_attach+0x35c/0x5c0 > [<0>] __cached_dev_store+0x820/0x8c0 > [<0>] bch_cached_dev_store+0x64/0x1a0 > [<0>] sysfs_kf_write+0x7c/0xc0 > [<0>] kernfs_fop_write+0x18c/0x250 > [<0>] __vfs_write+0x6c/0x1d0 > [<0>] vfs_write+0xd8/0x240 > [<0>] ksys_write+0x70/0x120 > [<0>] system_call+0x58/0x6c > > In case it's useful, here's the /proc/<pid>/stack of all the bcache > kernel processes: > > [bcache]: > [<0>] (null) > [<0>] __switch_to+0x30c/0x4d0 > [<0>] rescuer_thread+0x3a8/0x470 > [<0>] kthread+0x1a8/0x1b0 > [<0>] ret_from_kernel_thread+0x5c/0x8c > > [bcache_gc]: > [<0>] (null) > [<0>] __switch_to+0x30c/0x4d0 > [<0>] rescuer_thread+0x3a8/0x470 > [<0>] kthread+0x1a8/0x1b0 > [<0>] ret_from_kernel_thread+0x5c/0x8c > > [bcache_allocato]: > [<0>] 0xa00000000 > [<0>] __switch_to+0x30c/0x4d0 > [<0>] bch_allocator_thread+0x2e8/0xde0 > [<0>] kthread+0x1a8/0x1b0 > [<0>] ret_from_kernel_thread+0x5c/0x8c > > [bcache_gc]: > [<0>] (null) > [<0>] __switch_to+0x30c/0x4d0 > [<0>] bch_gc_thread+0x220/0x260 > [<0>] kthread+0x1a8/0x1b0 > [<0>] ret_from_kernel_thread+0x5c/0x8c > > [bcache_writebac]: > [<0>] (null) > [<0>] __switch_to+0x30c/0x4d0 > [<0>] rescuer_thread+0x3a8/0x470 > [<0>] kthread+0x1a8/0x1b0 > [<0>] ret_from_kernel_thread+0x5c/0x8c > Hi Cameron, It seems some kind of dead lock happening on writeback_lock semaphore. The 4.17 kernel is quite fresh, so I suspect maybe upstream kernel may have similar issue. Is it possible for you to compile and run Linux v4.18-rc3, if yes, I will send you a debug patch to print kernel message and see what happens. I know there is a very rare dead lock (which might not happen in real), not sure whether it is your condition. Thanks. Coly Li -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html