On 2019/6/6 5:02 下午, Massimo Burcheri wrote: > Hello, > > I got a system crash with corrupted cashing device. > What is your gcc version ? > The story: Booting after a clean halt was mounting the > btrfs-on-luks-on-bcache as read- only. The production system was > using a 4.19.1 kernel with bcache writeback mode, bfq scheduler. > btrfs scrub showed un-repaired errors. > > Next step was booting a live Linux OpenSuse TW live with a recent > 5.1.5 Kernel. Registering the caching device was segfaulting and > crashing the root shell, leaving the bcache module unusable. This one is important, if I can have the kernel message or call trace of this segfault it will be very helpful. > Booting up again I registered the backing device only and made it > online by echo 1 > running. > > # btrfs check -p /dev/mapper/_dev_bcache Opening filesystem to > check... parent transid verify failed on 67670720512 wanted 718539 > found 715246 parent transid verify failed on 67670720512 wanted > 718539 found 715246 Ignoring transid failure Couldn't setup extent > tree ERROR: cannot open file system > > # mount /dev/mapper/_dev_bcache /mnt/btrfs-top-lvl/ mount: > /mnt/btrfs-top-lvl: wrong fs type, bad option, bad superblock on > /dev/mapper/_dev_bcache, missing codepage or helper program, or > other error. > > Looks like some writeback was missing, transid gap was huge. > > Reading that there have been some important patches on bcache > released in 5.1.6 https://patchwork.kernel.org/patch/10909293/ I > installed a very recent 5.2... kernel and booted. Now registering > the caching device was not segfaulting anymore but freezing without > any return. > > In dmesg I found only this part [see below] which happened earlier > than my register. > > Is my bcache definitely lost? I am not sure for the dirty data on cache, but for the backing device you may have most of data back. Considering there is btrfs on top of it, a fsck is required. You may try to run the backing device wihtout attaching cache device by: echo 1 > /sys/block/bcache0/bcache/running > > Best regards, Massimo (..considering leaving bcache as just another > point-of-failure) > > > [ 12.390390] ------------[ cut here ]------------ [ 12.390392] > kernel BUG at drivers/md/bcache/bset.h:433! [ 12.390399] invalid > opcode: 0000 [#1] SMP PTI [ 12.390402] CPU: 0 PID: 862 Comm: > bcache-register Not tainted 5.2.0-rc3- 1.g038ee83-default #1 > openSUSE Tumbleweed (unreleased) [ 12.390403] Hardware name: > Hewlett-Packard HP EliteBook 8560w/1631, BIOS 68SVD Ver. F.03 > 07/25/2011 It is very suspicious like a corrupted btree node, but I don't have evidence so far. I receive similar report recently but with gcc9 compiled bcache binary. Not sure whether this one is related, but for the gcc9 compiled bcache issue, I am looking at it now. Thanks. Coly Li > [ 12.390413] RIP: 0010:bch_extent_sort_fixup+0x724/0x730 > [bcache] [ 12.390416] Code: ff ff 4c 89 c8 e9 3e ff ff ff 49 39 > f1 0f 97 c1 e9 74 ff ff ff 49 39 f2 41 0f 97 c5 e9 12 ff ff ff 48 > 8b 04 24 e9 88 fa ff ff <0f> 0b 0f 0b 48 29 d0 e9 88 fe ff ff 66 > 66 66 66 90 41 57 41 56 41 [ 12.390417] RSP: > 0000:ffffb40c82047a38 EFLAGS: 00010282 [ 12.390419] RAX: > fffffffffffeb580 RBX: ffff8e2bea3c0020 RCX: 0000000000000000 [ > 12.390420] RDX: 0000000000000001 RSI: 0000000000000001 RDI: > ffffb40c82047af0 [ 12.390421] RBP: ffffb40c82047a90 R08: > 0000000005103b10 R09: ffff8e2bdca28ba0 [ 12.390422] R10: > 0000000000000001 R11: 0000000000000000 R12: 0000000005118630 [ > 12.390423] R13: 0000000005118650 R14: ffffb40c82047ae0 R15: > ffff8e2bea3c0000 [ 12.390424] FS: 00007f672f634bc0(0000) > GS:ffff8e2beda00000(0000) knlGS:0000000000000000 [ 12.390426] CS: > 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 12.390427] CR2: > 00007f3b27b89900 CR3: 000000042b9d6003 CR4: 00000000000606f0 [ > 12.390428] Call Trace: [ 12.390436] btree_mergesort+0x19b/0x5c0 > [bcache] [ 12.390442] ? bch_cache_allocator_start+0x50/0x50 > [bcache] [ 12.390446] ? __alloc_pages_nodemask+0x13c/0x2d0 [ > 12.390451] __btree_sort+0x9e/0x1d0 [bcache] [ 12.390457] > bch_btree_node_read_done+0x2cb/0x3c0 [bcache] [ 12.390462] > bch_btree_node_read+0xdb/0x180 [bcache] [ 12.390467] ? > bch_keybuf_init+0x60/0x60 [bcache] [ 12.390472] > bch_btree_check_recurse+0x127/0x1f0 [bcache] [ 12.390477] > bch_btree_check+0x18e/0x1b0 [bcache] [ 12.390479] ? > wait_woken+0x70/0x70 [ 12.390486] run_cache_set+0x487/0x730 > [bcache] [ 12.390492] register_bcache+0xbfa/0xf80 [bcache] [ > 12.390495] ? __seccomp_filter+0x7b/0x680 [ 12.390497] ? > kernfs_fop_write+0x101/0x180 [ 12.390502] ? > bch_cache_set_alloc+0x540/0x540 [bcache] [ 12.390504] > kernfs_fop_write+0x101/0x180 [ 12.390507] vfs_write+0xb6/0x1a0 [ > 12.390509] ksys_write+0x4f/0xc0 [ 12.390512] > do_syscall_64+0x60/0x130 [ 12.390516] > entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 12.390518] RIP: > 0033:0x7f672f724854 [ 12.390520] Code: 00 f7 d8 64 89 02 48 c7 c0 > ff ff ff ff eb bb 0f 1f 80 00 00 00 00 48 8d 05 e9 49 0d 00 8b 00 > 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f > 00 48 83 ec 28 48 89 54 24 18 48 [ 12.390521] RSP: > 002b:00007fffebc37088 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ > 12.390523] RAX: ffffffffffffffda RBX: 000000000000000a RCX: > 00007f672f724854 [ 12.390524] RDX: 000000000000000a RSI: > 000055bf4ea5e260 RDI: 0000000000000003 [ 12.390525] RBP: > 000055bf4ea5e260 R08: 00000000ffffffff R09: 000000000000000a [ > 12.390526] R10: 00007fffebc3958b R11: 0000000000000246 R12: > 000000000000000a [ 12.390527] R13: 00007fffebc37110 R14: > 000000000000000a R15: 00007f672f7f47c0 [ 12.390528] Modules > linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp > coretemp kvm_intel kvm msr irqbypass crct10dif_pclmul crc32_pclmul > crc32c_intel mei_wdt mei_hdcp ghash_clmulni_intel iTCO_wdt > iTCO_vendor_support ppdev arc4 bcache crc64 iwldvm mac80211 > aesni_intel snd_hda_codec_idt aes_x86_64 crypto_simd > snd_hda_codec_generic cryptd snd_hda_codec_hdmi ledtrig_audio > glue_helper iwlwifi snd_hda_intel snd_hda_codec snd_hda_core > snd_hwdep cfg80211 snd_pcm joydev hp_wmi sparse_keymap pcspkr > e1000e snd_timer wmi_bmof snd rfkill hp_accel(+) mei_me ptp lpc_ich > soundcore pps_core lis3lv02d mei input_polldev parport_pc > thermal(+) parport tpm_infineon pcc_cpufreq ac battery button uas > usb_storage hid_generic usbhid radeon xhci_pci serio_raw xhci_hcd > firewire_ohci i2c_algo_bit ehci_pci sdhci_pci firewire_core > drm_kms_helper cqhci crc_itu_t sdhci ehci_hcd syscopyarea > sysfillrect sysimgblt fb_sys_fops usbcore ttm mmc_core drm wmi > video sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc [ > 12.390559] scsi_dh_alua [ 12.390564] ---[ end trace > 0613cd8ca3de039c ]--- [ 12.390569] RIP: > 0010:bch_extent_sort_fixup+0x724/0x730 [bcache] >