https://bugzilla.kernel.org/show_bug.cgi?id=198861 Bug ID: 198861 Summary: Regression causes kernel OOPS and hang in SCSI error report Product: IO/Storage Version: 2.5 Kernel Version: 4.14.20 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: SCSI Assignee: linux-scsi@xxxxxxxxxxxxxxx Reporter: ncopa@xxxxxxxxxxxxxxx Regression: No I have my ext4 /home dir mounted on lvm partition /dev/vg1/lv_home which is on mdadm raid10 device with 3 samsung evo 850 SSD disks. After I upgraded to 4.14.20 the computer hang with the HDD led light shining full (not blinking) as soon as I logged in (eg accesed the /home partition). I was able to get a dmesg out which shows an OOPS. The machine hanged as sonn any IO tried to access /home and a clean poweroff was not possible. [ 14.226693] ata1.00: exception Emask 0x10 SAct 0x10000 SErr 0x400101 action 0x6 frozen [ 14.226694] ata1.00: irq_stat 0x08000000, interface fatal error [ 14.226695] ata1: SError: { RecovData UnrecovData Handshk } [ 14.226696] ata1.00: failed command: WRITE FPDMA QUEUED [ 14.226698] ata1.00: cmd 61/e0:80:58:10:5c/00:00:0a:00:00/40 tag 16 ncq dma 114688 out [ 14.226699] ata1.00: status: { DRDY } [ 14.226701] ata1: hard resetting link [ 14.537049] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 14.538626] ata1.00: supports DRM functions and may not be fully accessible [ 14.539258] ata1.00: disabling queued TRIM support [ 14.540875] ata1.00: supports DRM functions and may not be fully accessible [ 14.541376] ata1.00: disabling queued TRIM support [ 14.542646] ata1.00: configured for UDMA/133 [ 14.542662] ata1: EH complete [ 15.036750] ------------[ cut here ]------------ [ 15.036764] WARNING: CPU: 0 PID: 0 at /home/buildozer/aports/main/linux-vanilla/src/linux-4.14/kernel/rcu/tree.c:2725 rcu_process_callbacks+0x370/0x421 [ 15.036766] Modules linked in: ctr ccm nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_nat nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c crc32c_generic ipt_REJECT nf_reject_ipv4 xt_tcpudp br_netfilter bridge stp llc ebtable_filter ebtables overlay exportfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables ipv6 bnep joydev mousedev hid_logitech_hidpp hid_logitech_dj nls_utf8 nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic arc4 input_leds btusb btrtl btbcm btintel bluetooth hid_apple hid_generic ecdh_generic iwlmvm mac80211 crct10dif_pclmul ghash_clmulni_intel iwlwifi pcbc coretemp tun deadline_iosched cfg80211 snd_hda_intel aesni_intel [ 15.036845] igb kvm_intel aes_x86_64 snd_hda_codec hwmon crypto_simd kvm dca glue_helper rfkill snd_hda_core cryptd irqbypass intel_cstate snd_hwdep intel_rapl_perf snd_pcm af_packet snd_timer e1000e psmouse snd serio_raw ptp soundcore pcspkr iTCO_wdt pps_core shpchp iTCO_vendor_support mei_me mei intel_pch_thermal thermal fan evdev efivarfs raid10 usbhid hid dm_mod dax crc32_pclmul crc32c_intel ahci libahci libata i2c_i801 xhci_pci xhci_hcd usbcore i915 video drm_kms_helper drm intel_gtt agpgart i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core nvme nvme_core wmi button loop raid1 ext4 crc16 mbcache jbd2 sd_mod scsi_mod [ 15.036928] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.20-0-vanilla #1-Alpine [ 15.036930] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z170N-WIFI-CF, BIOS F6 10/26/2015 [ 15.036933] task: fffffffface124c0 task.stack: fffffffface00000 [ 15.036938] RIP: 0010:rcu_process_callbacks+0x370/0x421 [ 15.036941] RSP: 0018:ffff8ff771c03f00 EFLAGS: 00010002 [ 15.036944] RAX: 0000000000000000 RBX: ffff8ff771c21940 RCX: 000000018040003b [ 15.036947] RDX: ffffffffffffd801 RSI: ffff8ff771c03f10 RDI: ffff8ff771c21978 [ 15.036949] RBP: fffffffface4f340 R08: 0000000000000001 R09: 0000000000000100 [ 15.036951] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8ff771c21978 [ 15.036954] R13: fffffffface124c0 R14: 7fffffffffffffff R15: fffffffffffffff9 [ 15.036957] FS: 0000000000000000(0000) GS:ffff8ff771c00000(0000) knlGS:0000000000000000 [ 15.036959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.036962] CR2: 00007f0ba7bdf078 CR3: 00000002afe0a006 CR4: 00000000003606b0 [ 15.036964] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 15.036966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 15.036968] Call Trace: [ 15.036973] <IRQ> [ 15.036982] ? rebalance_domains+0xf2/0x237 [ 15.036988] __do_softirq+0xfb/0x268 [ 15.036995] ? sched_clock+0x5/0x8 [ 15.037000] irq_exit+0x62/0xa1 [ 15.037005] smp_apic_timer_interrupt+0xbf/0xf7 [ 15.037009] apic_timer_interrupt+0x98/0xa0 [ 15.037012] </IRQ> [ 15.037017] RIP: 0010:cpuidle_enter_state+0x149/0x20d [ 15.037019] RSP: 0018:fffffffface03e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [ 15.037023] RAX: ffff8ff771c20c40 RBX: ffff8ff771c28fc8 RCX: 000000000000001f [ 15.037025] RDX: 000000038041cb3c RSI: 0000000000000000 RDI: 0000000000000000 [ 15.037027] RBP: 0000000000000005 R08: 00000666643562fe R09: 0000000000000014 [ 15.037029] R10: fffffffface03e60 R11: 000000000000165c R12: 0000000000000000 [ 15.037032] R13: fffffffface124c0 R14: fffffffface7ff60 R15: fffffffface80158 [ 15.037041] do_idle+0x11a/0x190 [ 15.037046] cpu_startup_entry+0x6f/0x71 [ 15.037052] start_kernel+0x44d/0x46d [ 15.037064] secondary_startup_64+0xa5/0xb0 [ 15.037070] Code: 8b 93 90 00 00 00 48 2b 15 6f 25 da 00 48 39 d0 7d 07 48 89 83 90 00 00 00 48 83 7b 38 00 0f 94 c2 48 85 c0 0f 94 c0 38 c2 74 02 <0f> ff 48 8b 3c 24 57 9d 0f 1f 44 00 00 4c 89 e7 e8 e5 26 00 00 [ 15.037142] ---[ end trace 486866c4a775ee92 ]--- [ 22.250602] logitech-hidpp-device 0003:046D:4060.0008: HID++ 4.5 device connected. This does not happen with 4.14.19, so I looked at the commit log and tried revert commit c561093ed6843684690436dea034af53b462cfe5 (scsi: core: Ensure that the SCSI error handler gets woken up). With that commit reverted machine again worked as normal. Linux ncopa-desktop 4.14.20-1-vanilla #2-Alpine SMP Wed Feb 21 15:11:34 CET 2018 x86_64 GNU/Linux dmesg output from my kernel with the regression reverted: [ 20.910014] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400101 action 0x6 frozen [ 20.910018] ata1.00: irq_stat 0x08000000, interface fatal error [ 20.910022] ata1: SError: { RecovData UnrecovData Handshk } [ 20.910026] ata1.00: failed command: WRITE DMA [ 20.910036] ata1.00: cmd ca/00:08:00:d0:5e/00:00:00:00:00/ed tag 2 dma 4096 out [ 20.910039] ata1.00: status: { DRDY } [ 20.910048] ata1: hard resetting link [ 21.221989] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 21.223512] ata1.00: supports DRM functions and may not be fully accessible [ 21.225118] ata1.00: supports DRM functions and may not be fully accessible [ 21.226387] ata1.00: configured for UDMA/133 [ 21.226402] ata1: EH complete [ 21.259887] ata1: limiting SATA link speed to 3.0 Gbps [ 21.259898] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen [ 21.259902] ata1.00: irq_stat 0x08000000, interface fatal error [ 21.259908] ata1: SError: { UnrecovData Handshk } [ 21.259914] ata1.00: failed command: WRITE DMA [ 21.259929] ata1.00: cmd ca/00:e0:20:10:5c/00:00:00:00:00/ea tag 3 dma 114688 out [ 21.259933] ata1.00: status: { DRDY } [ 21.259946] ata1: hard resetting link [ 21.571618] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 21.573260] ata1.00: supports DRM functions and may not be fully accessible [ 21.575300] ata1.00: supports DRM functions and may not be fully accessible [ 21.576892] ata1.00: configured for UDMA/133 [ 21.576913] ata1: EH complete [ 22.124450] logitech-hidpp-device 0003:046D:4060.0008: HID++ 4.5 device connected. So problem seems to happen when there are one or more ata errors. I have seen those errors on 4.9 kernel too, but they don't seem to do any harm, until 4.14.20 kernel. -- You are receiving this mail because: You are the assignee for the bug.