[Bug 198861] New: Regression causes kernel OOPS and hang in SCSI error report

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=198861

            Bug ID: 198861
           Summary: Regression causes kernel OOPS and hang in SCSI error
                    report
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 4.14.20
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: SCSI
          Assignee: linux-scsi@xxxxxxxxxxxxxxx
          Reporter: ncopa@xxxxxxxxxxxxxxx
        Regression: No

I have my ext4 /home dir mounted on lvm partition /dev/vg1/lv_home which is on
mdadm raid10 device with 3 samsung evo 850 SSD disks.

After I upgraded to 4.14.20 the computer hang with the HDD led light shining
full (not blinking) as soon as I logged in (eg accesed the /home partition).

I was able to get a dmesg out which shows an OOPS. The machine hanged as sonn
any IO tried to access /home and a clean poweroff was not possible.

[   14.226693] ata1.00: exception Emask 0x10 SAct 0x10000 SErr 0x400101 action
0x6 frozen
[   14.226694] ata1.00: irq_stat 0x08000000, interface fatal error
[   14.226695] ata1: SError: { RecovData UnrecovData Handshk }
[   14.226696] ata1.00: failed command: WRITE FPDMA QUEUED
[   14.226698] ata1.00: cmd 61/e0:80:58:10:5c/00:00:0a:00:00/40 tag 16 ncq dma
114688 out
[   14.226699] ata1.00: status: { DRDY }
[   14.226701] ata1: hard resetting link
[   14.537049] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   14.538626] ata1.00: supports DRM functions and may not be fully accessible
[   14.539258] ata1.00: disabling queued TRIM support
[   14.540875] ata1.00: supports DRM functions and may not be fully accessible
[   14.541376] ata1.00: disabling queued TRIM support
[   14.542646] ata1.00: configured for UDMA/133
[   14.542662] ata1: EH complete
[   15.036750] ------------[ cut here ]------------
[   15.036764] WARNING: CPU: 0 PID: 0 at
/home/buildozer/aports/main/linux-vanilla/src/linux-4.14/kernel/rcu/tree.c:2725
rcu_process_callbacks+0x370/0x421
[   15.036766] Modules linked in: ctr ccm nf_conntrack_netlink nfnetlink
xfrm_user xfrm_algo xt_addrtype xt_CHECKSUM iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_nat
nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c crc32c_generic ipt_REJECT
nf_reject_ipv4 xt_tcpudp br_netfilter bridge stp llc ebtable_filter ebtables
overlay exportfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables
ipv6 bnep joydev mousedev hid_logitech_hidpp hid_logitech_dj nls_utf8 nls_cp437
vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic arc4
input_leds btusb btrtl btbcm btintel bluetooth hid_apple hid_generic
ecdh_generic iwlmvm mac80211 crct10dif_pclmul ghash_clmulni_intel iwlwifi pcbc
coretemp tun deadline_iosched cfg80211 snd_hda_intel aesni_intel
[   15.036845]  igb kvm_intel aes_x86_64 snd_hda_codec hwmon crypto_simd kvm
dca glue_helper rfkill snd_hda_core cryptd irqbypass intel_cstate snd_hwdep
intel_rapl_perf snd_pcm af_packet snd_timer e1000e psmouse snd serio_raw ptp
soundcore pcspkr iTCO_wdt pps_core shpchp iTCO_vendor_support mei_me mei
intel_pch_thermal thermal fan evdev efivarfs raid10 usbhid hid dm_mod dax
crc32_pclmul crc32c_intel ahci libahci libata i2c_i801 xhci_pci xhci_hcd
usbcore i915 video drm_kms_helper drm intel_gtt agpgart i2c_algo_bit
fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core nvme nvme_core wmi
button loop raid1 ext4 crc16 mbcache jbd2 sd_mod scsi_mod
[   15.036928] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.20-0-vanilla
#1-Alpine
[   15.036930] Hardware name: Gigabyte Technology Co., Ltd. To be filled by
O.E.M./Z170N-WIFI-CF, BIOS F6 10/26/2015
[   15.036933] task: fffffffface124c0 task.stack: fffffffface00000
[   15.036938] RIP: 0010:rcu_process_callbacks+0x370/0x421
[   15.036941] RSP: 0018:ffff8ff771c03f00 EFLAGS: 00010002
[   15.036944] RAX: 0000000000000000 RBX: ffff8ff771c21940 RCX:
000000018040003b
[   15.036947] RDX: ffffffffffffd801 RSI: ffff8ff771c03f10 RDI:
ffff8ff771c21978
[   15.036949] RBP: fffffffface4f340 R08: 0000000000000001 R09:
0000000000000100
[   15.036951] R10: 0000000000000001 R11: 0000000000000000 R12:
ffff8ff771c21978
[   15.036954] R13: fffffffface124c0 R14: 7fffffffffffffff R15:
fffffffffffffff9
[   15.036957] FS:  0000000000000000(0000) GS:ffff8ff771c00000(0000)
knlGS:0000000000000000
[   15.036959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.036962] CR2: 00007f0ba7bdf078 CR3: 00000002afe0a006 CR4:
00000000003606b0
[   15.036964] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   15.036966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[   15.036968] Call Trace:
[   15.036973]  <IRQ>
[   15.036982]  ? rebalance_domains+0xf2/0x237
[   15.036988]  __do_softirq+0xfb/0x268
[   15.036995]  ? sched_clock+0x5/0x8
[   15.037000]  irq_exit+0x62/0xa1
[   15.037005]  smp_apic_timer_interrupt+0xbf/0xf7
[   15.037009]  apic_timer_interrupt+0x98/0xa0
[   15.037012]  </IRQ>
[   15.037017] RIP: 0010:cpuidle_enter_state+0x149/0x20d
[   15.037019] RSP: 0018:fffffffface03e80 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff10
[   15.037023] RAX: ffff8ff771c20c40 RBX: ffff8ff771c28fc8 RCX:
000000000000001f
[   15.037025] RDX: 000000038041cb3c RSI: 0000000000000000 RDI:
0000000000000000
[   15.037027] RBP: 0000000000000005 R08: 00000666643562fe R09:
0000000000000014
[   15.037029] R10: fffffffface03e60 R11: 000000000000165c R12:
0000000000000000
[   15.037032] R13: fffffffface124c0 R14: fffffffface7ff60 R15:
fffffffface80158
[   15.037041]  do_idle+0x11a/0x190
[   15.037046]  cpu_startup_entry+0x6f/0x71
[   15.037052]  start_kernel+0x44d/0x46d
[   15.037064]  secondary_startup_64+0xa5/0xb0
[   15.037070] Code: 8b 93 90 00 00 00 48 2b 15 6f 25 da 00 48 39 d0 7d 07 48
89 83 90 00 00 00 48 83 7b 38 00 0f 94 c2 48 85 c0 0f 94 c0 38 c2 74 02 <0f> ff
48 8b 3c 24 57 9d 0f 1f 44 00 00 4c 89 e7 e8 e5 26 00 00 
[   15.037142] ---[ end trace 486866c4a775ee92 ]---
[   22.250602] logitech-hidpp-device 0003:046D:4060.0008: HID++ 4.5 device
connected.



This does not happen with 4.14.19, so I looked at the commit log and tried
revert commit c561093ed6843684690436dea034af53b462cfe5 (scsi: core: Ensure that
the SCSI error handler gets woken up).

With that commit reverted machine again worked as normal.

Linux ncopa-desktop 4.14.20-1-vanilla #2-Alpine SMP Wed Feb 21 15:11:34 CET
2018 x86_64 GNU/Linux

dmesg output from my kernel with the regression reverted:

[   20.910014] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400101 action 0x6
frozen
[   20.910018] ata1.00: irq_stat 0x08000000, interface fatal error
[   20.910022] ata1: SError: { RecovData UnrecovData Handshk }
[   20.910026] ata1.00: failed command: WRITE DMA
[   20.910036] ata1.00: cmd ca/00:08:00:d0:5e/00:00:00:00:00/ed tag 2 dma 4096
out
[   20.910039] ata1.00: status: { DRDY }
[   20.910048] ata1: hard resetting link
[   21.221989] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   21.223512] ata1.00: supports DRM functions and may not be fully accessible
[   21.225118] ata1.00: supports DRM functions and may not be fully accessible
[   21.226387] ata1.00: configured for UDMA/133
[   21.226402] ata1: EH complete
[   21.259887] ata1: limiting SATA link speed to 3.0 Gbps
[   21.259898] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6
frozen
[   21.259902] ata1.00: irq_stat 0x08000000, interface fatal error
[   21.259908] ata1: SError: { UnrecovData Handshk }
[   21.259914] ata1.00: failed command: WRITE DMA
[   21.259929] ata1.00: cmd ca/00:e0:20:10:5c/00:00:00:00:00/ea tag 3 dma
114688 out
[   21.259933] ata1.00: status: { DRDY }
[   21.259946] ata1: hard resetting link
[   21.571618] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   21.573260] ata1.00: supports DRM functions and may not be fully accessible
[   21.575300] ata1.00: supports DRM functions and may not be fully accessible
[   21.576892] ata1.00: configured for UDMA/133
[   21.576913] ata1: EH complete
[   22.124450] logitech-hidpp-device 0003:046D:4060.0008: HID++ 4.5 device
connected.

So problem seems to happen when there are one or more ata errors.

I have seen those errors on 4.9 kernel too, but they don't seem to do any harm,
until 4.14.20 kernel.

-- 
You are receiving this mail because:
You are the assignee for the bug.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux