sata_svw: intermittent crashes with BCM5785 (HT1000) controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, I've experienced some intermittent crashes with the sata_svw driver
in kernel 2.6.16.1 (Debian kernel revision 2.6.16-4).  Four Maxtor
6V300F0 SATA drives are attached to the onboard controller on a
Supermicro H8DSP-8 motherboard, and these drives have passed a few dozen
badblocks scans flawlessly (four badblocks processes running
simultaneously, one on each drive).

There are a few md software raid devices set up and the problem usually
manifests itself by kicking devices from the md arrays.

The machine will usually crash (well, the kernel doesn't go down, but
I'm unable to login remotely) when there's little or no load on the box,
and here's what I'll see in my logs:

Apr 14 17:16:34 pcap kernel: ata4: status=0xd0 { Busy }
Apr 14 17:16:34 pcap kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000003431C
Apr 14 17:16:34 pcap last message repeated 3 times
Apr 14 17:16:44 pcap kernel: ata4: status=0xd0 { Busy }
Apr 14 17:16:44 pcap kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000003431C
Apr 14 17:16:44 pcap last message repeated 2 times
Apr 14 17:16:54 pcap kernel: ata4: status=0xd0 { Busy }
Apr 14 17:16:54 pcap kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000003431C
Apr 14 17:16:54 pcap last message repeated 2 times
Apr 14 17:17:04 pcap kernel: ata4: status=0xd0 { Busy }
Apr 14 17:17:24 pcap kernel: PGD 7f7c3067 PUD 7f35b067 PMD 0 
Apr 14 17:17:24 pcap kernel: CPU 0 
Apr 14 17:17:24 pcap kernel: Modules linked in: ipv6 button ac battery iptable_filter ip_tables x_tables cpufreq_userspace powernow_k8 freq_table evdev psmouse serio_raw pcspkr shpchp pci_hotplug ext3 jbd mbcache raid5 xor raid1 md_mod ide_generic ide_cd cdrom sd_mod sata_svw libata scsi_mod e1000 tg3 ehci_hcd ohci_hcd serverworks generic ide_core thermal processor fan
Apr 14 17:17:24 pcap kernel: Pid: 1239, comm: ata/0 Not tainted 2.6.16-1-amd64-k8-smp #1
Apr 14 17:17:24 pcap kernel: RIP: 0010:[_end+130480710/2132488192] <ffffffff880bca46>{:libata:ata_pio_poll+186}
Apr 14 17:17:24 pcap kernel: RSP: 0018:ffff81007de71e18  EFLAGS: 00010287
Apr 14 17:17:24 pcap kernel: RAX: 0000000112dfe101 RBX: ffff81007e373480 RCX: ffffffff80347988
Apr 14 17:17:24 pcap kernel: RDX: ffffffff80347988 RSI: 0000000000000246 RDI: ffff81007e373480
Apr 14 17:17:24 pcap kernel: RBP: 0000000000000000 R08: ffffffff80347988 R09: 0000000000000009
Apr 14 17:17:24 pcap kernel: R10: 0000000100000000 R11: 0000000000000246 R12: 0000000000000004
Apr 14 17:17:24 pcap kernel: R13: 0000000000000002 R14: ffff81007e373480 R15: ffffffff880bd1f1
Apr 14 17:17:24 pcap kernel: FS:  00002b1da7f34640(0000) GS:ffffffff8040a000(0000) knlGS:0000000000000000
Apr 14 17:17:24 pcap kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Apr 14 17:17:24 pcap kernel: CR2: 00000000000000d0 CR3: 000000007f08b000 CR4: 00000000000006e0
Apr 14 17:17:24 pcap kernel: Process ata/0 (pid: 1239, threadinfo ffff81007de70000, task ffff81007ffc47a0)
Apr 14 17:17:24 pcap kernel: Stack: ffff81007e373480 ffff81007e373480 0000000000000000 ffff81007e0bedc0 
Apr 14 17:17:24 pcap kernel:        0000000000000206 ffffffff880bd238 ffff81007e0bedd8 ffff81007e373b30 
Apr 14 17:17:24 pcap kernel:        ffff81007e373b38 ffffffff8013afe6 
Apr 14 17:17:24 pcap kernel: Call Trace: <ffffffff880bd238>{:libata:ata_pio_task+71}
Apr 14 17:17:24 pcap kernel:        <ffffffff8013afe6>{run_workqueue+155} <ffffffff8013b037>{worker_thread+0}
Apr 14 17:17:24 pcap kernel:        <ffffffff8013e167>{keventd_create_kthread+0} <ffffffff8013b13f>{worker_thread+264}
Apr 14 17:17:24 pcap kernel:        <ffffffff80125ea6>{default_wake_function+0} <ffffffff8013e167>{keventd_create_kthread+0}
Apr 14 17:17:24 pcap kernel:        <ffffffff80125ea6>{default_wake_function+0} <ffffffff8013e167>{keventd_create_kthread+0}
Apr 14 17:17:24 pcap kernel:        <ffffffff8013e13e>{kthread+129} <ffffffff8010b742>{child_rip+8}
Apr 14 17:17:24 pcap kernel:        <ffffffff8013e167>{keventd_create_kthread+0} <ffffffff8013e0bd>{kthread+0}
Apr 14 17:17:24 pcap kernel:        <ffffffff8010b73a>{child_rip+0}
Apr 14 17:17:24 pcap kernel: 
Apr 14 17:17:24 pcap kernel: Code: 83 8d d0 00 00 00 04 c7 83 10 07 00 00 03 00 00 00 eb 15 b8 
Apr 14 17:36:22 pcap -- MARK --
Apr 14 17:46:24 pcap kernel:  <4>ATA: abnormal status 0xD0 on port 0xFFFFC2000003431C
Apr 14 17:46:24 pcap kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000003431C
Apr 14 17:46:24 pcap last message repeated 2 times
Apr 14 17:46:34 pcap kernel: ata4: status=0xd0 { Busy }
Apr 14 17:46:34 pcap kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000003431C
Apr 14 17:46:34 pcap last message repeated 3 times
Apr 14 17:46:44 pcap kernel: ata4: status=0xd0 { Busy }
Apr 14 17:46:44 pcap kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000003431C
Apr 14 17:46:44 pcap last message repeated 2 times
Apr 14 17:46:54 pcap kernel: ata4: status=0xd0 { Busy }
Apr 14 17:46:54 pcap kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000003431C
Apr 14 17:46:54 pcap last message repeated 2 times
Apr 14 17:47:04 pcap kernel: ata4: status=0xd0 { Busy }
Apr 14 17:47:24 pcap kernel: PGD 7ec0e067 PUD 7e973067 PMD 0 
Apr 14 17:47:24 pcap kernel: CPU 1 
Apr 14 17:47:24 pcap kernel: Modules linked in: ipv6 button ac battery iptable_filter ip_tables x_tables cpufreq_userspace powernow_k8 freq_table evdev psmouse serio_raw pcspkr shpchp pci_hotplug ext3 jbd mbcache raid5 xor raid1 md_mod ide_generic ide_cd cdrom sd_mod sata_svw libata scsi_mod e1000 tg3 ehci_hcd ohci_hcd serverworks generic ide_core thermal processor fan
Apr 14 17:47:24 pcap kernel: Pid: 1240, comm: ata/1 Not tainted 2.6.16-1-amd64-k8-smp #1
Apr 14 17:47:24 pcap kernel: RIP: 0010:[_end+130480710/2132488192] <ffffffff880bca46>{:libata:ata_pio_poll+186}
Apr 14 17:47:24 pcap kernel: RSP: 0018:ffff81007de73e18  EFLAGS: 00010293
Apr 14 17:47:24 pcap kernel: RAX: 0000000112fb596c RBX: ffff81007e373480 RCX: ffffffff80347988
Apr 14 17:47:24 pcap kernel: RDX: ffffffff80347988 RSI: 0000000000000246 RDI: ffff81007e373480
Apr 14 17:47:24 pcap kernel: RBP: 0000000000000000 R08: ffffffff80347988 R09: 0000000000000009
Apr 14 17:47:24 pcap kernel: R10: 0000000100000000 R11: 0000000000000246 R12: 0000000000000004
Apr 14 17:47:24 pcap kernel: R13: 0000000000000002 R14: ffff81007e373480 R15: ffffffff880bd1f1
Apr 14 17:47:24 pcap kernel: FS:  00002afa939f9640(0000) GS:ffff81007fa42ac0(0000) knlGS:0000000000000000
Apr 14 17:47:24 pcap kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Apr 14 17:47:24 pcap kernel: CR2: 00000000000000d0 CR3: 000000007f99e000 CR4: 00000000000006e0
Apr 14 17:47:24 pcap kernel: Process ata/1 (pid: 1240, threadinfo ffff81007de72000, task ffff81007f2f5180)
Apr 14 17:47:24 pcap kernel: Stack: ffff81007e373480 ffff81007e373480 0000000000000000 ffff81007e0be5c0 
Apr 14 17:47:24 pcap kernel:        0000000000000206 ffffffff880bd238 ffff81007e0be5d8 ffff81007e373b30 
Apr 14 17:47:24 pcap kernel:        ffff81007e373b38 ffffffff8013afe6 
Apr 14 17:47:24 pcap kernel: Call Trace: <ffffffff880bd238>{:libata:ata_pio_task+71}
Apr 14 17:47:24 pcap kernel:        <ffffffff8013afe6>{run_workqueue+155} <ffffffff8013b037>{worker_thread+0}
Apr 14 17:47:24 pcap kernel:        <ffffffff8013e167>{keventd_create_kthread+0} <ffffffff8013b13f>{worker_thread+264}
Apr 14 17:47:24 pcap kernel:        <ffffffff80125ea6>{default_wake_function+0} <ffffffff8013e167>{keventd_create_kthread+0}
Apr 14 17:47:24 pcap kernel:        <ffffffff80125ea6>{default_wake_function+0} <ffffffff8013e167>{keventd_create_kthread+0}
Apr 14 17:47:24 pcap kernel:        <ffffffff8013e13e>{kthread+129} <ffffffff8010b742>{child_rip+8}
Apr 14 17:47:24 pcap kernel:        <ffffffff8013e167>{keventd_create_kthread+0} <ffffffff8013e0bd>{kthread+0}
Apr 14 17:47:24 pcap kernel:        <ffffffff8010b73a>{child_rip+0}
Apr 14 17:47:24 pcap kernel: 
Apr 14 17:47:24 pcap kernel: Code: 83 8d d0 00 00 00 04 c7 83 10 07 00 00 03 00 00 00 eb 15 b8 
Apr 14 18:16:22 pcap -- MARK --
Apr 14 18:16:33 pcap kernel:  <3>ata1: command 0xb0 timeout, stat 0x58 host_stat 0x0
Apr 14 18:18:03 pcap kernel: ATA: abnormal status 0x58 on port 0xFFFFC2000003401C
Apr 14 18:18:03 pcap last message repeated 3 times
Apr 14 18:18:03 pcap kernel: ata1: status=0xd8 { Busy }
Apr 14 18:18:03 pcap kernel: ATA: abnormal status 0xD8 on port 0xFFFFC2000003401C
Apr 14 18:18:03 pcap last message repeated 2 times
Apr 14 18:18:03 pcap kernel: ata1: status=0xd8 { Busy }
Apr 14 18:18:03 pcap kernel: sd 0:0:0:0: SCSI error: return code = 0x8000002
Apr 14 18:18:03 pcap kernel: sda: Current: sense key: Aborted Command
Apr 14 18:18:03 pcap kernel:     Additional sense: Scsi parity error
Apr 14 18:18:03 pcap kernel: end_request: I/O error, dev sda, sector 16000575
Apr 14 18:18:03 pcap kernel: ATA: abnormal status 0xD8 on port 0xFFFFC2000003401C
Apr 14 18:18:03 pcap last message repeated 2 times
Apr 14 18:18:03 pcap kernel: ata1: status=0xd8 { Busy }
Apr 14 18:18:03 pcap kernel: ^IOperation continuing on 1 devices
Apr 14 18:18:03 pcap kernel: ATA: abnormal status 0xD8 on port 0xFFFFC2000003401C
Apr 14 18:18:03 pcap last message repeated 3 times
Apr 14 18:18:03 pcap kernel: RAID1 conf printout:
Apr 14 18:18:03 pcap kernel:  --- wd:1 rd:2
Apr 14 18:18:03 pcap kernel:  disk 0, wo:1, o:0, dev:sda1
Apr 14 18:18:03 pcap kernel:  disk 1, wo:0, o:1, dev:sdb1
Apr 14 18:18:03 pcap kernel: RAID1 conf printout:
Apr 14 18:18:03 pcap kernel:  --- wd:1 rd:2
Apr 14 18:18:03 pcap kernel:  disk 1, wo:0, o:1, dev:sdb1
Apr 14 18:18:13 pcap kernel: ata1: status=0xd8 { Busy }
Apr 14 18:18:13 pcap kernel: ATA: abnormal status 0xD8 on port 0xFFFFC2000003401C
Apr 14 18:18:13 pcap last message repeated 2 times
Apr 14 18:18:23 pcap kernel: ata1: status=0xd8 { Busy }


    Here are the relevant boot messages:


Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks HT1000: IDE controller at PCI slot 0000:00:02.1
SvrWks HT1000: chipset revision 0
SvrWks HT1000: not 100%% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:DMA
Copyright (c) 1999-2005 Intel Corporation.
SCSI subsystem initialized
hdb: MATSHITADVD-ROM SR-8178, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14

ata1: SATA max UDMA/133 cmd 0xFFFFC20000034000 ctl 0xFFFFC20000034020 bmdma 0xFFFFC20000034030 irq 11
ata2: SATA max UDMA/133 cmd 0xFFFFC20000034100 ctl 0xFFFFC20000034120 bmdma 0xFFFFC20000034130 irq 11
ata3: SATA max UDMA/133 cmd 0xFFFFC20000034200 ctl 0xFFFFC20000034220 bmdma 0xFFFFC20000034230 irq 11
ata4: SATA max UDMA/133 cmd 0xFFFFC20000034300 ctl 0xFFFFC20000034320 bmdma 0xFFFFC20000034330 irq 11
ata1: SATA link up 1.5 Gbps (SStatus 113)
ata1: dev 0 ATA-7, max UDMA/133, 586114704 sectors: LBA48
ata1: dev 0 configured for UDMA/133
scsi0 : sata_svw
ata2: SATA link up 1.5 Gbps (SStatus 113)
ata2: dev 0 ATA-7, max UDMA/133, 586114704 sectors: LBA48
ata2: dev 0 configured for UDMA/133
scsi1 : sata_svw
ata3: SATA link up 1.5 Gbps (SStatus 113)
ata3: dev 0 ATA-7, max UDMA/133, 586114704 sectors: LBA48
ata3: dev 0 configured for UDMA/133
scsi2 : sata_svw
ata4: SATA link up 1.5 Gbps (SStatus 113)
ata4: dev 0 ATA-7, max UDMA/133, 586114704 sectors: LBA48
ata4: dev 0 configured for UDMA/133
scsi3 : sata_svw
  Vendor: ATA       Model: Maxtor 6V300F0    Rev: VA11
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: Maxtor 6V300F0    Rev: VA11
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: Maxtor 6V300F0    Rev: VA11
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: Maxtor 6V300F0    Rev: VA11
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 586114704 512-byte hdwr sectors (300091 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
SCSI device sda: 586114704 512-byte hdwr sectors (300091 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
 sda:<6>hdb: ATAPI 24X DVD-ROM drive, 256kB Cache, UDMA(66)
Uniform CD-ROM driver Revision: 3.20
 sda1 sda2
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 586114704 512-byte hdwr sectors (300091 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
SCSI device sdb: 586114704 512-byte hdwr sectors (300091 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
 sdb: sdb1 sdb2
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 586114704 512-byte hdwr sectors (300091 MB)
sdc: Write Protect is off
SCSI device sdc: drive cache: write back
SCSI device sdc: 586114704 512-byte hdwr sectors (300091 MB)
sdc: Write Protect is off
SCSI device sdc: drive cache: write back
 sdc: sdc1 sdc2
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 586114704 512-byte hdwr sectors (300091 MB)
sdd: Write Protect is off
SCSI device sdd: drive cache: write back
SCSI device sdd: 586114704 512-byte hdwr sectors (300091 MB)
sdd: Write Protect is off
SCSI device sdd: drive cache: write back
 sdd: sdd1 sdd2
sd 3:0:0:0: Attached scsi disk sdd
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
md: raid1 personality registered for level 1
raid5: automatically using best checksumming function: generic_sse
   generic_sse:  6084.000 MB/sec
raid5: using function: generic_sse (6084.000 MB/sec)
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
md: md0 stopped.
md: bind<sdb1>
md: bind<sda1>
raid1: raid set md0 active with 2 out of 2 mirrors
md: md1 stopped.
md: bind<sdb2>
md: bind<sdc2>
md: bind<sdd2>
md: bind<sda2>
raid5: device sda2 operational as raid disk 0
raid5: device sdd2 operational as raid disk 3
raid5: device sdc2 operational as raid disk 2
raid5: device sdb2 operational as raid disk 1
raid5: allocated 4262kB for md1
raid5: raid level 5 set md1 active with 4 out of 4 devices, algorithm 2
RAID5 conf printout:
 --- rd:4 wd:4 fd:0
 disk 0, o:1, dev:sda2
 disk 1, o:1, dev:sdb2
 disk 2, o:1, dev:sdc2
 disk 3, o:1, dev:sdd2
md: md2 stopped.
md: bind<sdd1>
md: bind<sdc1>
raid1: raid set md2 active with 2 out of 2 mirrors


    Here is the output of lspci -n:


0000:00:01.0 0604: 1166:0036
0000:00:02.0 0600: 1166:0205
0000:00:02.1 0101: 1166:0214
0000:00:02.2 0601: 1166:0234
0000:00:03.0 0c03: 1166:0223 (rev 01)
0000:00:03.1 0c03: 1166:0223 (rev 01)
0000:00:03.2 0c03: 1166:0223 (rev 01)
0000:00:05.0 0300: 1002:4752 (rev 27)
0000:00:08.0 0604: 1166:0130 (rev a3)
0000:00:09.0 0604: 1166:0130 (rev a3)
0000:00:0a.0 0604: 1166:0132 (rev a3)
0000:00:0b.0 0604: 1166:0132 (rev a3)
0000:00:0c.0 0604: 1166:0132 (rev a3)
0000:00:0d.0 0604: 1166:0132 (rev a3)
0000:00:18.0 0600: 1022:1100
0000:00:18.1 0600: 1022:1101
0000:00:18.2 0600: 1022:1102
0000:00:18.3 0600: 1022:1103
0000:00:19.0 0600: 1022:1100
0000:00:19.1 0600: 1022:1101
0000:00:19.2 0600: 1022:1102
0000:00:19.3 0600: 1022:1103
0000:01:0d.0 0604: 1166:0104 (rev b2)
0000:01:0e.0 0104: 1166:024a
0000:02:06.0 0200: 8086:1079 (rev 03)
0000:02:06.1 0200: 8086:1079 (rev 03)
0000:04:04.0 0200: 14e4:166a (rev 03)
0000:04:04.1 0200: 14e4:166a (rev 03)


    Here is the output of lspci -vvv for the controller:


0000:01:0e.0 RAID bus controller: Broadcom BCM5785 (HT1000) SATA Native SATA Mode
        Subsystem: Broadcom BCM5785 (HT1000) SATA Native SATA Mode
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 64
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at bc00 [size=8]
        Region 1: I/O ports at b880 [size=4]
        Region 2: I/O ports at b800 [size=8]
        Region 3: I/O ports at b480 [size=4]
        Region 4: I/O ports at b400 [size=32]
        Region 5: Memory at fc9fe000 (32-bit, non-prefetchable) [size=8K]
        Expansion ROM at fc9c0000 [disabled] [size=128K]
        Capabilities: [60] PCI-X non-bridge device.
                Command: DPERE- ERO- RBC=0 OST=4
                Status: Bus=1 Dev=14 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=0, DMOST=4, DMCRS=2, RSCEM-
        Capabilities: [90] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a0] Message Signalled Interrupts: 64bit- Queue=0/0 Enable-
                Address: 00000000  Data: 0000

-- 
Robert Edmonds
-
: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux