From: Tang Junhui <tang.junhui@xxxxxxxxxx> Hello Matthias, What kind of caching mode do you use? Writeback or Writethrough? > this seems to not have gone through on the first attempt, sorry if you > get this twice. > > Running some fio jobs directly on a writeback bcache device (no fs > mounted), I can hang or crash the kernel with high probability (although > not deterministically). > > for n in $(seq 1 8); do > fio --filename=/dev/bcache0 --direct=1 --sync=1 --rw=write --bs=4k --numjobs=${n} --iodepth=1 --runtime=30 --time_based --group_reporting --name=journal-test > done > > If the backing device is a md raid5, bcache ?ocks up with a kernel BUG > message, "shutdown -r now" hangs, but I can still reboot using magic > sysrq (reisub). > > If the backing device is a md raid10, the machine also shows a kernel > BUG but completely freezes after that. On the VGA console, there is > often another kernel BUG message, which hasn't made it to the serial > console, but which is very similar to the first message. > > There is no hang or crash if I run those tests on the md devices > directly with no bcache. > > This happens with kernel versions 4.15.0 (kernel.org), 4.14.15 > (kernel.org), 4.13.0-32 (Ubuntu Xenial HWE kernel). With an older > 4.10.1, no crashes happen. > > the kernel BUGs seem to happen at different places (from different > runs with 4.15.0): > > [ 438.880774] kernel BUG at block/bio.c:560! > > [ 440.012034] kernel BUG at block/blk-ioc.c:146! > > [ 5378.266726] Kernel BUG at 000000006726b688 [verbose debug info unavailable] > > [ 103.315681] Kernel BUG at 00000000af7724c9 [verbose debug info unavailable] > > > Kernel BUG messages from a hang with raid5 and a lockup with raid10 > caught with USB serial console follow below. > > Regards > Matthias Ferdinand > > ----------------------------------------------------------------------- > > raid5: > > [ 440.012034] kernel BUG at block/blk-ioc.c:146! > [ 440.012696] invalid opcode: 0000 [#1] SMP NOPTI > [ 440.013355] Modules linked in: bcache ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables intel_rapl ttm x86_pkg_temp_thermal intel_powerclamp drm_kms_helper coretemp kvm_intel drm kvm ipmi_ssif i2c_algo_bit fb_sys_fops syscopyarea sysfillrect irqbypass sysimgblt crct10dif_pclmul crc32_pclmul hpilo ghash_clmulni_intel gpio_ich shpchp serio_raw pcbc aesni_intel ipmi_si aes_x86_64 crypto_simd ipmi_msghandler dm_multipath glue_helper cryptd acpi_power_meter intel_cstate intel_rapl_perf input_leds lpc_ich mfd_core ie31200_edac btrfs zstd_decompress zstd_compress xxhash > [ 440.023485] raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq hid_generic libcrc32c uas raid1 usbhid psmouse raid0 usb_storage tg3 ahci hid libahci multipath linear [last unloaded: bcache] > [ 440.026537] CPU: 2 PID: 2205 Comm: md127_raid5 Not tainted 4.15.0 #8 > [ 440.027441] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16/2015 > [ 440.028615] RIP: 0010:put_io_context+0x8b/0x90 > [ 440.029246] RSP: 0018:ffffa8c882b43af8 EFLAGS: 00010246 > [ 440.029990] RAX: 0000000000000000 RBX: ffffa8c88294fca0 RCX: 00000000000f4240 > [ 440.031006] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffffa8c88294fca0 > [ 440.032030] RBP: ffffa8c882b43b10 R08: 0000000000000003 R09: ffff949cb80c1700 > [ 440.033206] R10: 0000000000000104 R11: 000000000000b71c R12: 0000000000001000 > [ 440.034222] R13: 0000000000000000 R14: ffff949cad84db70 R15: ffff949cb11bd1e0 > [ 440.035239] FS: 0000000000000000(0000) GS:ffff949cba280000(0000) knlGS:0000000000000000 > [ 440.060190] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 440.084967] CR2: 00007ff0493ef000 CR3: 00000002f1e0a002 CR4: 00000000001606e0 > [ 440.110498] Call Trace: > [ 440.135443] bio_disassociate_task+0x1b/0x60 > [ 440.160355] bio_free+0x1b/0x60 > [ 440.184666] bio_put+0x23/0x30 > [ 440.208272] search_free+0x23/0x40 [bcache] > [ 440.231448] cached_dev_write_complete+0x31/0x70 [bcache] > [ 440.254468] closure_put+0xb6/0xd0 [bcache] > [ 440.277087] request_endio+0x30/0x40 [bcache] > [ 440.298703] bio_endio+0xa1/0x120 > [ 440.319644] handle_stripe+0x418/0x2270 [raid456] > [ 440.340614] ? load_balance+0x17b/0x9c0 > [ 440.360506] handle_active_stripes.isra.58+0x387/0x5a0 [raid456] > [ 440.380675] ? __release_stripe+0x15/0x20 [raid456] > [ 440.400132] raid5d+0x3ed/0x5d0 [raid456] > [ 440.419193] ? schedule+0x36/0x80 > [ 440.437932] ? schedule_timeout+0x1d2/0x2f0 > [ 440.456136] md_thread+0x122/0x150 > [ 440.473687] ? wait_woken+0x80/0x80 > [ 440.491411] kthread+0x102/0x140 > [ 440.508636] ? find_pers+0x70/0x70 > [ 440.524927] ? kthread_associate_blkcg+0xa0/0xa0 > [ 440.541791] ret_from_fork+0x35/0x40 > [ 440.558020] Code: c2 48 00 5b 41 5c 41 5d 5d c3 48 89 c6 4c 89 e7 e8 bb c2 48 00 48 8b 3d bc 36 4b 01 48 89 de e8 7c f7 e0 ff 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 00 0f 1f 44 00 00 55 48 8d 47 b8 48 89 e5 41 57 41 > [ 440.610020] RIP: put_io_context+0x8b/0x90 RSP: ffffa8c882b43af8 > [ 440.628575] ---[ end trace a1fd79d85643a73e ]--- > > ----------------------------------------------------------------------- > > raid10: > > [ 438.880774] kernel BUG at block/bio.c:560! > [ 438.881378] invalid opcode: 0000 [#1] SMP NOPTI > [ 438.882197] Modules linked in: bcache ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp ttm coretemp kvm_intel drm_kms_helper kvm drm irqbypass crct10dif_pclmul ipmi_ssif crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper ipmi_si i2c_algo_bit ipmi_msghandler gpio_ich fb_sys_fops lpc_ich hpilo mfd_core acpi_power_meter ie31200_edac syscopyarea sysfillrect sysimgblt serio_raw dm_multipath input_leds cryptd shpchp intel_cstate intel_rapl_perf btrfs zstd_decompress zstd_compress xxhash > [ 438.892634] raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c hid_generic uas raid1 usbhid usb_storage tg3 raid0 psmouse ahci hid libahci multipath linear [last unloaded: bcache] > [ 438.895674] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.15.0 #8 > [ 438.896516] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16/2015 > [ 438.897524] RIP: 0010:bio_put+0x25/0x30 > [ 438.898069] RSP: 0018:ffff8ca2ba303c70 EFLAGS: 00010246 > [ 438.898987] RAX: 0000000000000000 RBX: ffff8ca2a1adf788 RCX: 00000000000f4240 > [ 438.900003] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffffa87782d67c08 > [ 438.901020] RBP: ffff8ca2ba303c70 R08: 0000000000000000 R09: ffffffff85012480 > [ 438.902036] R10: 000000000000012b R11: 000000000000013c R12: ffff8ca28fb30010 > [ 438.903053] R13: 0000000000000000 R14: ffff8ca291327460 R15: 0000000000000000 > [ 438.904237] FS: 0000000000000000(0000) GS:ffff8ca2ba300000(0000) knlGS:0000000000000000 > [ 438.928279] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 438.952654] CR2: 00007fe51a454090 CR3: 00000002c420a004 CR4: 00000000001606e0 > [ 438.977670] Call Trace: > [ 439.002472] <IRQ> > [ 439.026414] search_free+0x23/0x40 [bcache] > [ 439.050258] cached_dev_write_complete+0x31/0x70 [bcache] > [ 439.074040] closure_put+0xb6/0xd0 [bcache] > [ 439.097246] request_endio+0x30/0x40 [bcache] > [ 439.119920] bio_endio+0xa1/0x120 > [ 439.141667] raid_end_bio_io+0x30/0xc0 [raid10] > [ 439.163411] one_write_done+0x35/0x50 [raid10] > [ 439.184486] raid10_end_write_request+0x112/0x250 [raid10] > [ 439.205551] bio_endio+0xa1/0x120 > [ 439.225647] blk_update_request+0xb7/0x310 > [ 439.245295] scsi_end_request+0x34/0x200 > [ 439.264379] scsi_io_completion+0x10d/0x5c0 > [ 439.283087] scsi_finish_command+0xd9/0x120 > [ 439.301348] scsi_softirq_done+0x144/0x170 > [ 439.318718] blk_done_softirq+0x7c/0x90 > [ 439.335596] __do_softirq+0xc9/0x26a > [ 439.352176] irq_exit+0xa5/0xb0 > [ 439.368242] do_IRQ+0x51/0xd0 > [ 439.383765] common_interrupt+0x9f/0x9f > [ 439.399217] </IRQ> > [ 439.414466] RIP: 0010:cpuidle_enter_state+0xeb/0x290 > [ 439.430078] RSP: 0018:ffffa87781917e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb > [ 439.462221] RAX: ffff8ca2ba3229c0 RBX: ffff8ca2ba32b700 RCX: 000000000000001f > [ 439.479067] RDX: 0000000000000000 RSI: ffffffe1cfb49a99 RDI: 0000000000000000 > [ 439.495611] RBP: ffffa87781917eb8 R08: 00000000000003ff R09: 0000000000000187 > [ 439.511910] R10: 000000000000008a R11: 0000000000000018 R12: 0000000000000002 > [ 439.528158] R13: 0000000000000004 R14: ffffffff85171380 R15: 000000662f545620 > [ 439.543608] cpuidle_enter+0x17/0x20 > [ 439.559325] call_cpuidle+0x23/0x40 > [ 439.574921] do_idle+0x185/0x210 > [ 439.590218] cpu_startup_entry+0x1d/0x30 > [ 439.605566] start_secondary+0x133/0x170 > [ 439.620897] secondary_startup_64+0xa5/0xb0 > [ 439.635885] Code: ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 f6 47 15 01 48 89 e5 74 0f 8b 47 74 85 c0 74 0f f0 ff 4f 74 74 02 5d c3 e8 7d ff ff ff 5d c3 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 > [ 439.684209] RIP: bio_put+0x25/0x30 RSP: ffff8ca2ba303c70 > [ 439.700579] ---[ end trace ce9f8201937066d4 ]--- > [ 439.720653] Kernel panic - not syncing: Fatal exception in interrupt > > ~ 2 seconds later a similar message, but not on the serial console anymore > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks. Tang Junhui -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html