Re: sata_mv, io stucks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Mark Lord wrote:
ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata14.00: cmd 61/08:00:3f:52:54/00:00:57:00:00/40 tag 0 ncq 4096 out
        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Yeah, I see what I was missing earlier:   "(timeout)".
So it's "none of" the driver paths.

This could very well be due to one/several of the as-yet un-addressed
chipset errata for the 6081.  Someday we'll have software workarounds
for those, but I'm (still) waiting on Marvell for stuff.


After a bit of testing, it seems that writing is required to trigger the bug, dstat output follows:

--dsk/sde-----dsk/sdf-----dsk/sdg-----dsk/sdh-----dsk/sdi-----dsk/sdj-----dsk/sdk--
read writ: read writ: read writ: read writ: read writ: read writ: read writ 37M 0 : 35M 0 : 35M 0 : 37M 0 : 34M 0 : 35M 0 : 32M 0 35M 0 : 34M 0 : 34M 0 : 35M 0 : 37M 0 : 37M 0 : 36M 0 34M 0 : 35M 0 : 35M 0 : 40M 0 : 36M 0 : 33M 0 : 35M 0 30M 8192B: 28M 8192B: 30M 8192B: 30M 0 : 28M 8192B: 30M 8192B: 28M 8192B 35M 0 : 37M 0 : 33M 0 : 0 0 : 36M 0 : 34M 0 : 35M 0 36M 0 : 35M 0 : 35M 0 : 0 0 : 35M 0 : 34M 0 : 34M 0 34M 0 : 37M 0 : 38M 0 : 0 0 : 36M 0 : 36M 0 : 35M 0

I was running fio, reading from all drives connected to 6081. After nothing happened for a while, I decided to mount the xfs filesystem read-write and it hung immediately before mount was even complete.

I also managed to catch the panic I mentioned, running kernel 2.6.28-rc5:

[ 503.918122] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  503.918399] IP: [<ffffffff804d3938>] scsi_times_out+0x8/0x70
[  503.918561] PGD 229068067 PUD 22a1f0067 PMD 0
[  503.918814] Oops: 0000 [#1] SMP
[  503.919009] last sysfs file: /sys/block/sdk/stat
[  503.919123] CPU 2
[ 503.919273] Modules linked in: kvm_intel kvm coretemp w83627hf w83793 hwmon_vid hwmon nf_conntrack_ftp 3c59x i2c_i801 i2c_core e100 iTCO_wdt
[  503.920074] Pid: 0, comm: swapper Not tainted 2.6.28-rc5 #4
[ 503.920190] RIP: 0010:[<ffffffff804d3938>] [<ffffffff804d3938>] scsi_times_out+0x8/0x70
[  503.920417] RSP: 0018:ffff88022f0f3e60  EFLAGS: 00010046
[ 503.920540] RAX: ffff88022d4f5470 RBX: 0000000000000000 RCX: ffff88022d4f5ac8 [ 503.920659] RDX: ffff88022d4f57e8 RSI: 0000000000000eae RDI: ffff8801f8188848 [ 503.920777] RBP: ffff88022d4f5988 R08: 0000000000000000 R09: 0000000000000000 [ 503.920897] R10: ffffffff804d6142 R11: ffffffff805dc480 R12: ffff88022f0e4000 [ 503.921015] R13: ffff88022d4f57e8 R14: 0000000000000000 R15: ffff88022d4f5470 [ 503.921134] FS: 0000000000000000(0000) GS:ffff88022f08bac0(0000) knlGS:0000000000000000
[  503.921317] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 503.921434] CR2: 0000000000000000 CR3: 000000022a0cf000 CR4: 00000000000026e0 [ 503.921553] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 503.921674] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 503.921793] Process swapper (pid: 0, threadinfo ffff88022f0ee000, task ffff88022f0e2c30)
[  503.921985] Stack:
[ 503.922094] ffff8801f8188848 ffffffff80416eee ffff8801f8188848 ffffffff80416fea [ 503.922116] 0000000000000282 ffff88022d4f5470 0000000000000100 ffff88022f0e4000 [ 503.922116] ffff88022f0f3ee0 ffffffff80416f30 ffff88022f0e5018 ffffffff8024393b
[  503.922116] Call Trace:
[  503.922116]  <IRQ> <0> [<ffffffff80416eee>] ? blk_rq_timed_out+0xe/0x50
[  503.922116]  [<ffffffff80416fea>] ? blk_rq_timed_out_timer+0xba/0x120
[  503.922116]  [<ffffffff80416f30>] ? blk_rq_timed_out_timer+0x0/0x120
[  503.922116]  [<ffffffff8024393b>] ? run_timer_softirq+0x1bb/0x230
[  503.922116]  [<ffffffff8023f00b>] ? __do_softirq+0x8b/0x150
[  503.922116]  [<ffffffff8020e7db>] ? profile_pc+0x3b/0x80
[  503.922116]  [<ffffffff8020c8fc>] ? call_softirq+0x1c/0x40
[  503.922116]  [<ffffffff8020db55>] ? do_softirq+0x35/0x70
[  503.922116]  [<ffffffff802205b5>] ? smp_apic_timer_interrupt+0x85/0xd0
[  503.922116]  [<ffffffff8020c34b>] ? apic_timer_interrupt+0x6b/0x70
[  503.922116]  <EOI> <0> [<ffffffff805dc480>] ? udp_poll+0x0/0x150
[  503.922116]  [<ffffffff80212d8c>] ? mwait_idle+0x3c/0x40
[  503.922116]  [<ffffffff80209d5a>] ? cpu_idle+0x3a/0x70
[ 503.922116] Code: 18 4c 8b 74 24 20 48 83 c4 28 c3 be 06 00 00 00 48 89 df e8 9b c8 ff ff 85 c0 75 c3 eb 87 0f 1f 44 00 00 53 48 8b 9f e0 00 00 00 <48> 8b 03 48
[  503.922116] RIP  [<ffffffff804d3938>] scsi_times_out+0x8/0x70
[  503.922116]  RSP <ffff88022f0f3e60>
[  503.922116] CR2: 0000000000000000
[  503.922116] Kernel panic - not syncing: Fatal exception in interrupt
[  503.922116] ------------[ cut here ]------------
[ 503.922116] WARNING: at kernel/smp.c:333 smp_call_function_mask+0x236/0x240() [ 503.922116] Modules linked in: kvm_intel kvm coretemp w83627hf w83793 hwmon_vid hwmon nf_conntrack_ftp 3c59x i2c_i801 i2c_core e100 iTCO_wdt
[  503.922116] Pid: 0, comm: swapper Tainted: G      D    2.6.28-rc5 #4
[  503.922116] Call Trace:
[  503.922116]  <IRQ>  [<ffffffff80239ea4>] warn_on_slowpath+0x64/0xa0
[  503.922116]  [<ffffffff80252396>] up+0x16/0x50
[  503.922116]  [<ffffffff8023a657>] release_console_sem+0x197/0x1e0
[  503.922116]  [<ffffffff8025c126>] smp_call_function_mask+0x236/0x240
[  503.922116]  [<ffffffff8023b0fe>] printk+0x4e/0x60
[  503.922116]  [<ffffffff80252396>] up+0x16/0x50
[  503.922116]  [<ffffffff8021f290>] native_smp_send_stop+0x20/0x30
[  503.922116]  [<ffffffff80239f7e>] panic+0x8e/0x150
[  503.922116]  [<ffffffff8020e582>] show_registers+0x192/0x250
[  503.922116]  [<ffffffff8047d745>] do_unblank_screen+0x15/0x140
[  503.922116]  [<ffffffff80636370>] oops_end+0xa0/0xb0
[  503.922116]  [<ffffffff80637f43>] do_page_fault+0x6a3/0x830
[  503.922116]  [<ffffffff80635799>] error_exit+0x0/0x51
[  503.922116]  [<ffffffff805dc480>] udp_poll+0x0/0x150
[  503.922116]  [<ffffffff804d6142>] scsi_request_fn+0xe2/0x400
[  503.922116]  [<ffffffff804d3938>] scsi_times_out+0x8/0x70
[  503.922116]  [<ffffffff80416eee>] blk_rq_timed_out+0xe/0x50
[  503.922116]  [<ffffffff80416fea>] blk_rq_timed_out_timer+0xba/0x120
[  503.922116]  [<ffffffff80416f30>] blk_rq_timed_out_timer+0x0/0x120
[  503.922116]  [<ffffffff8024393b>] run_timer_softirq+0x1bb/0x230
[  503.922116]  [<ffffffff8023f00b>] __do_softirq+0x8b/0x150
[  503.922116]  [<ffffffff8020e7db>] profile_pc+0x3b/0x80
[  503.922116]  [<ffffffff8020c8fc>] call_softirq+0x1c/0x40
[  503.922116]  [<ffffffff8020db55>] do_softirq+0x35/0x70
[  503.922116]  [<ffffffff802205b5>] smp_apic_timer_interrupt+0x85/0xd0
[  503.922116]  [<ffffffff8020c34b>] apic_timer_interrupt+0x6b/0x70
[  503.922116]  <EOI>  [<ffffffff805dc480>] udp_poll+0x0/0x150
[  503.922116]  [<ffffffff80212d8c>] mwait_idle+0x3c/0x40
[  503.922116]  [<ffffffff80209d5a>] cpu_idle+0x3a/0x70
[  503.922116] ---[ end trace 3eef0898db52fd7a ]---


--
Harri.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux