I have a problem where BCache is hanging. My hardware is: 1 - Dell PowerEdge R710 w/ 24 x Xeon processors, 96GB of ram 2 - Micron P320H SSD 3 - LSI storage device connected by a SAS interface The steps that I take to cause this hang are: 1 - make-bcache -w4k --cache /dev/rssda1 - WORKS 2 - make-bcache --bdev /dev/mapper/largevol - WORKS 3 - echo "/dev/mapper/largevol" > /sys/fs/bcache/register - WORKS 4 - echo "/dev/rssda1" > /sys/fs/bcache/register - HANGS When it hangs I see the following in dmesg.. [ 3268.467982] bcache: invalidating existing data Then some time later I get the following error message.. [ 3294.938341] BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:2:6785] [ 3294.938345] Modules linked in: binfmt_misc edd mperf fuse loop pciehp pci_hotplug coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper i7core_edac iTCO_wdt iTCO_vendor_support cryptd edac_core lpc_ich aes_x86_64 mtip32xx(O) bnx2 wmi sg mfd_core sr_mod joydev aes_generic hid_generic cdrom acpi_power_meter microcode dcdbas pcspkr serio_raw button rtc_cmos mptctl dm_mirror dm_region_hash dm_log linear usbhid hid uhci_hcd ehci_hcd qla2xxx usbcore usb_common scsi_transport_fc sd_mod scsi_tgt crc_t10dif processor thermal_sys hwmon scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh_rdac scsi_dh dm_snapshot dm_mod ext3 mbcache jbd ata_generic ata_piix libata mptsas mptscsih mptbase mpt2sas scsi_transport_sas raid_class scsi_mod [ 3294.938381] CPU 2 [ 3294.938384] Pid: 6785, comm: kworker/2:2 Tainted: G O 3.6.0-rc3-0.5-default+ #1 Dell Inc. PowerEdge R710/00NH4P [ 3294.938385] RIP: 0010:[<ffffffff81049b70>] [<ffffffff81049b70>] __do_softirq+0x70/0x210 [ 3294.938392] RSP: 0018:ffff88183f243ee0 EFLAGS: 00000206 [ 3294.938393] RAX: ffff8817dc74dfd8 RBX: ffff88183f24d8c0 RCX: 0000000000000002 [ 3294.938394] RDX: 0000000000000002 RSI: 000000000000004b RDI: ffffffffff5fa380 [ 3294.938394] RBP: ffff88183f243f40 R08: 0000000000000000 R09: ffffffff816057c0 [ 3294.938395] R10: 0000000000000400 R11: ffff88183f2529a0 R12: ffff88183f243e58 [ 3294.938396] R13: ffffffff8147010a R14: ffff88183f243f40 R15: 0000000000000046 [ 3294.938397] FS: 0000000000000000(0000) GS:ffff88183f240000(0000) knlGS:0000000000000000 [ 3294.938399] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 3294.938400] CR2: ffffe8ffffa00000 CR3: 00000017dbb00000 CR4: 00000000000007e0 [ 3294.938401] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3294.938402] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 3294.938403] Process kworker/2:2 (pid: 6785, threadinfo ffff8817dc74c000, task ffff8817d71ea340) [ 3294.938403] Stack: [ 3294.938404] ffff88183f24d940 ffff8817dc74dfd8 ffff8817dc74dfd8 042080603f243f08 [ 3294.938407] ffffffff8109715f 0000000a3f243f88 ffffffff00000002 ffff8817dc74dfd8 [ 3294.938410] 0000000000000046 ffff8817d6fdc000 ffff8817d6fdca10 ffff8817dc74ddc8 [ 3294.938412] Call Trace: [ 3294.938413] <IRQ> [ 3294.938414] [<ffffffff8109715f>] ? tick_program_event+0x1f/0x30 [ 3294.938424] [<ffffffff814707fc>] call_softirq+0x1c/0x30 [ 3294.938428] [<ffffffff810043c5>] do_softirq+0x65/0xa0 [ 3294.938429] [<ffffffff810499c5>] irq_exit+0xc5/0xe0 [ 3294.938432] [<ffffffff81027759>] smp_apic_timer_interrupt+0x69/0xa0 [ 3294.938434] [<ffffffff8147010a>] apic_timer_interrupt+0x6a/0x70 [ 3294.938435] <EOI> [ 3294.938436] [<ffffffff8134d23e>] ? invalidate_buckets_lru+0x2fe/0x7f0 [ 3294.938440] [<ffffffff8134d8f5>] invalidate_buckets+0x1c5/0x1f0 [ 3294.938442] [<ffffffff8134dc38>] bch_allocator_thread+0x318/0x690 [ 3294.938447] [<ffffffff81064ab0>] ? wake_up_bit+0x40/0x40 [ 3294.938450] [<ffffffff810708db>] ? complete+0x4b/0x60 [ 3294.938452] [<ffffffff8105c8a3>] process_one_work+0x1d3/0x370 [ 3294.938454] [<ffffffff8134d920>] ? invalidate_buckets+0x1f0/0x1f0 [ 3294.938456] [<ffffffff8105f5e3>] worker_thread+0x133/0x390 [ 3294.938457] [<ffffffff8105f4b0>] ? manage_workers+0x70/0x70 [ 3294.938459] [<ffffffff810643fe>] kthread+0x9e/0xb0 [ 3294.938461] [<ffffffff81470704>] kernel_thread_helper+0x4/0x10 [ 3294.938463] [<ffffffff81064360>] ? kthread_freezable_should_stop+0x70/0x70 [ 3294.938465] [<ffffffff81470700>] ? gs_change+0x13/0x13 [ 3294.938465] Code: 25 20 b0 00 00 41 89 d6 89 4d d0 c7 45 cc 0a 00 00 00 48 89 45 b0 48 89 45 a8 90 65 c7 04 25 00 05 01 00 00 00 00 00 fb 66 66 90 <66> 66 90 45 31 ed 66 2e 0f 1f 84 00 00 00 00 00 49 8d 85 80 40 [ 3300.603968] ata1: lost interrupt (Status 0x58) [ 3300.646011] ata1: drained 65536 bytes to clear DRQ [ 3300.646054] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 3300.646057] sr 2:0:0:0: CDB: [ 3300.646058] Get event status notification: 4a 01 00 00 10 00 00 00 08 00 [ 3300.646065] ata1.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in [ 3300.646065] res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout) [ 3300.646075] ata1.00: status: { DRDY } [ 3300.646085] ata1: hard resetting link [ 3301.119798] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 3301.143856] ata1.00: configured for UDMA/100 [ 3301.144955] ata1: EH complete [ 3322.926498] BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:2:6785] This is reproducible. Any ideas on how to proceed or what I can do to help you debug this are most appreciated. -brad w. -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html