On Wed, Dec 22, 2010 at 5:43 AM, Rogier Wolff <R.E.Wolff@xxxxxxxxxxxx> wrote: > > Unquoted text below is from either me or from my friend. > > > Someone suggested we try an older kernel as if kernel 2.6.32 would not > have this problem. We do NOT think it suddenly started with a certain > kernel version. I was just hoping to have you kernel-guys help with > prodding the kernel into revealing which component was screwing things > up.... > > > On Mon, Dec 20, 2010 at 01:32:44PM -0500, Greg Freemyer wrote: >> On Mon, Dec 20, 2010 at 1:06 PM, Bruno Prémont >> <bonbons@xxxxxxxxxxxxxxxxx> wrote: >> > Hi, >> > >> > [ccing linux-ide] >> > >> > Please provide the part of kernel log showing initialization of your >> > disk controller(s) as well as detection of all the discs. > > > sata_sil 0000:03:01.0: version 2.4 > sata_sil 0000:03:01.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24 > sata_sil 0000:03:01.0: Applying R_ERR on DMA activate FIS errata fix > scsi2 : sata_sil > scsi3 : sata_sil > scsi4 : sata_sil > scsi5 : sata_sil > ata3: SATA max UDMA/100 mmio m1024@0xed200000 tf 0xed200080 irq 24 > ata4: SATA max UDMA/100 mmio m1024@0xed200000 tf 0xed2000c0 irq 24 > ata5: SATA max UDMA/100 mmio m1024@0xed200000 tf 0xed200280 irq 24 > ata6: SATA max UDMA/100 mmio m1024@0xed200000 tf 0xed2002c0 irq 24 > ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > ata3.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133 > ata3.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32) > ata3.00: configured for UDMA/100 > scsi 2:0:0:0: Direct-Access ATA WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5 > usb 2-2: new low speed USB device using uhci_hcd and address 2 > ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > ata4.00: ATA-7: SAMSUNG HD103SI, 1AG01118, max UDMA7 > ata4.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32) > ata4.00: configured for UDMA/100 > scsi 3:0:0:0: Direct-Access ATA SAMSUNG HD103SI 1AG0 PQ: 0 ANSI: 5 > ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > ata5.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133 > ata5.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32) > ata5.00: configured for UDMA/100 > scsi 4:0:0:0: Direct-Access ATA WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5 > ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > ata6.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133 > ata6.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32) > ata6.00: configured for UDMA/100 > scsi 5:0:0:0: Direct-Access ATA WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5 > sd 2:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) > sd 2:0:0:0: [sda] Write Protect is off > sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sd 3:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) > sd 3:0:0:0: [sdb] Write Protect is off > sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sd 4:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) > sd 4:0:0:0: [sdc] Write Protect is off > sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 > sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sd 5:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) > sd 5:0:0:0: [sdd] Write Protect is off > sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00 > sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sd 5:0:0:0: [sdd] Write Protect is off > sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00 > sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sdb: sdb1 sdb2 sdb3 sdb4 > sd 3:0:0:0: [sdb] Attached SCSI disk > sda: sda1 sda2 sda3 sda4 > sd 2:0:0:0: [sda] Attached SCSI disk > sdc: sdc1 sdc2 sdc3 sdc4 > sd 4:0:0:0: [sdc] Attached SCSI disk > sdd: sdd1 sdd2 sdd3 sdd4 > sd 5:0:0:0: [sdd] Attached SCSI disk > > > >> > Verbose lspci output for the disc controller and $(smartctl -i -A $disk) >> > output might be useful as well. > > > 03:01.0 Mass storage controller: Silicon Image, Inc. SiI 3114 > [SATALink/SATARaid] Serial ATA Controller (rev 02) > Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR+ FastB2B- DisINTx- > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 32, Cache Line Size: 32 bytes > Interrupt: pin A routed to IRQ 24 > Region 0: I/O ports at 4020 [size=8] > Region 1: I/O ports at 4014 [size=4] > Region 2: I/O ports at 4018 [size=8] > Region 3: I/O ports at 4010 [size=4] > Region 4: I/O ports at 4000 [size=16] > Region 5: Memory at ed200000 (32-bit, non-prefetchable) [size=1K] > [virtual] Expansion ROM at e8000000 [disabled] [size=512K] > Capabilities: [60] Power Management version 2 > Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME- > Kernel driver in use: sata_sil > Kernel modules: sata_sil > > > But also tried onboard card: > > 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE > Controller (rev 01) (prog-if 8a [Master SecP PriP]) > Subsystem: Super Micro Computer Inc Device 7980 > Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx- > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 18 > Region 0: I/O ports at 01f0 [size=8] > Region 1: I/O ports at 03f4 [size=1] > Region 2: I/O ports at 0170 [size=8] > Region 3: I/O ports at 0374 [size=1] > Region 4: I/O ports at 30a0 [size=16] > Kernel driver in use: ata_piix > Kernel modules: ata_generic, pata_acpi, ata_piix, ide-pci-generic, > piix > > smartctl output: > Kernel modules: ata_generic, pata_acpi, ata_piix, ide-pci-generic, > piix > > smartctl output: > > smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Model Family: Western Digital Caviar Green (Adv. Format) family > Device Model: WDC WD10EARS-00Y5B1 > Serial Number: WD-WCAV55759454 > Firmware Version: 80.00A80 > User Capacity: 1,000,204,886,016 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 8 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Tue Dec 21 20:06:00 2010 CET > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail > Always - 0 > 3 Spin_Up_Time 0x0027 132 119 021 Pre-fail > Always - 6391 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age > Always - 56 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail > Always - 0 > 7 Seek_Error_Rate 0x002e 200 200 000 Old_age > Always - 0 > 9 Power_On_Hours 0x0032 091 091 000 Old_age > Always - 7189 > 10 Spin_Retry_Count 0x0032 100 253 000 Old_age > Always - 0 > 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age > Always - 54 > 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always > - 39 > 193 Load_Cycle_Count 0x0032 164 164 000 Old_age Always > - 109955 > 194 Temperature_Celsius 0x0022 109 107 000 Old_age Always > - 38 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always > - 0 > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always > - 0 > 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always > - 0 > 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age > Offline - 0 > - 0 > 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age > Offline - 0 > > smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Model Family: Western Digital Caviar Green (Adv. Format) family > Device Model: WDC WD10EARS-00Y5B1 > Serial Number: WD-WCAV55759454 > Firmware Version: 80.00A80 > User Capacity: 1,000,204,886,016 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 8 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Tue Dec 21 20:06:00 2010 CET > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail > Always - 0 > 3 Spin_Up_Time 0x0027 132 119 021 Pre-fail > Always - 6391 > 4 Start_Stop_Count 0x0032 100 100 000 Old_age > Always - 56 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail > Always - 0 > 7 Seek_Error_Rate 0x002e 200 200 000 Old_age > Always - 0 > 9 Power_On_Hours 0x0032 091 091 000 Old_age > Always - 7189 > 10 Spin_Retry_Count 0x0032 100 253 000 Old_age > Always - 0 > 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age > Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age > Always - 54 > 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always > - 39 > 193 Load_Cycle_Count 0x0032 164 164 000 Old_age Always > - 109955 > 194 Temperature_Celsius 0x0022 109 107 000 Old_age Always > - 38 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always > - 0 > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always > - 0 > 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age > Offline - 0 > 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always > - 0 > 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age > Offline - 0 > > smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build) > 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age > Offline - 0 > > > The others are very similar.... > > >> > >> > Did you try the individual discs on a completely different system (e.g. >> > plain desktop system) and what revision of SATA are both components >> > supporting? > > Yes I did. The disks were installed in a MSI/Core2DUO based desktop > system. No problems at all. Transfer rates up to 200MB/s. > > > The SIL 3114 chip is 1.5Gbps SATA. . > > > Searching for information on the WD drives I stumbled across: > > http://community.wdc.com/t5/Other-Internal-Drives/1-TB-WD10EARS-desynch-issues-in-RAID/m-p/11559 > > Where it seems that WD simply says not to use these drives in a RAID. > I have experience with "Raid Edition" drives: They go bad at a MUCH > too high rate. If we can't use the non-raid for a RAID application, then > there is just ONE possible option: STAY AWAY FROM WESTERN DIGITAL: > > Western digital claims it has the right to mess things up if you put a > non-raid drive in a raid configuration. Well fine. Then they can also > mess things up in normal situations because when Linux does software > raid there isn't any difference from RAID accesses. > > (if you click through and read their entry in the knowledge base, > you'd notice that it should be more or less the other way > around. Linux will drop the RAID-enabled drive from the RAID within > seven seconds and reporting error on a sector, whereas the desktop > drive would remain operational until Linux times out (30 seconds?)) > > > > More hardware info: > > System: Supermicro PDSMi, 4xDDR2 1GB, disks and controllers as above. > Current kernel version: 2.6.36.2 > Problem was also present in kernel 2.6.33 (sorry cannot downgrade again. > This is a production system...) > > uname -a: > Linux jcz.nl 2.6.36-ARCH #1 SMP PREEMPT Fri Dec 10 20:32:37 CET 2010 > x86_64 Intel(R) Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux > > Disklayout: > > major minor #blocks name > > 8 0 976762584 sda > 8 1 240943 sda1 > 8 2 19535040 sda2 > 8 3 1951897 sda3 > 8 4 955032120 sda4 > 8 16 976762584 sdb > 8 17 240943 sdb1 > 8 18 19535040 sdb2 > 8 19 1951897 sdb3 > 8 20 955032120 sdb4 > 8 32 976762584 sdc > 8 33 240943 sdc1 > 8 34 19535040 sdc2 > 8 35 1951897 sdc3 > 8 36 955032120 sdc4 > 8 48 976762584 sdd > 8 49 240943 sdd1 > 8 50 19535040 sdd2 > 8 51 1951897 sdd3 > 8 52 955032120 sdd4 > 9 127 240832 md127 > 9 1 39067648 md1 > 9 126 1910063104 md126 > 9 125 3903488 md125 > > MDstat: > > Personalities : [raid1] [raid6] [raid5] [raid4] > md125 : active raid5 sdd3[5](S) sdb3[4] sda3[0] sdc3[3] > 3903488 blocks super 1.1 level 5, 512k chunk, algorithm 2 [3/3] [UUU] > > md126 : active raid5 sda4[0] sdd4[3] sdc4[5](S) sdb4[4] > 1910063104 blocks super 1.1 level 5, 512k chunk, algorithm 2 > [3/3] [UUU] > > md1 : active raid5 sda2[0] sdd2[3](S) sdb2[1] sdc2[4] > 39067648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] > [3/3] [UUU] > > md1 : active raid5 sda2[0] sdd2[3](S) sdb2[1] sdc2[4] > 39067648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] > [UUU] > > md127 : active raid1 sdd1[3](S) sda1[0] sdb1[1] sdc1[2] > 240832 blocks [3/3] [UUU] > > unused devices: <none> > rootfs / rootfs rw 0 0 > proc /proc proc rw,relatime 0 0 > sys /sys sysfs rw,relatime 0 0 > udev /dev devtmpfs > rw,nosuid,relatime,size=10240k,nr_inodes=506317,mode=755 0 0 > /dev/disk/by-label/rootfs / ext4 > rw,relatime,barrier=1,stripe=256,data=ordered 0 0 > devpts /dev/pts devpts rw,relatime,mode=600,ptmxmode=000 0 0 > shm /dev/shm tmpfs rw,nosuid,nodev,relatime 0 0 > /dev/md127 /boot ext3 > rw,relatime,errors=continue,barrier=0,data=writeback 0 0 > /dev/md126 /data ext4 rw,relatime,barrier=1,data=ordered 0 0 > > > Because of the severity of the problems (which remain after trying > another sata card), I have already bought a new Supermicro server. Let's > hope that helps. The load_cycle_counts are very high and that means your drive heads are parking all the time. Possibly multiple times a minute. I don't know if its your problem, but I'd say something is wrong and I've seen excessive head parking cause disk write failures in Windows. In linux I think it just wears out your drive way prematurely. And of course and i/o's are delayed if the heads are parked when the commands hit the drive. There is a linux package specifically targeting drives that have this issue. Hopefully it can at least keep your heads from parking continuously. storage-fixup. 1) Be sure you have the userspace package storage-fixup installed. 2) Look in /etc/storage-fixup.conf and see if your drives are in the list. If not, try to work with the storage-fixup maintainer (Tejun Heo?) to get your drives added. And while testing, watch Load_cycle_count and ensure it is not increasing too fast. ie. Several times an hour is fine. Several times per minute is too much. Greg -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html