sata_promise: random/intermittent errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've decided to post this to the linux-ide list to see if I can get to the
bottom of this problem I'm experiencing with sata_promise and my PATA drives.

I've pasted a thread from the linux-raid list where I was trying to
troubleshoot/recover a destroyed raid5 array.

First a full history:

1) 2.6.17.13: 3 drive PATA raid5 array with one drive starting to give read
errors (legitimate according to SMART logs).
2) System lockups (no kernel panic seen) during load - I suspect due to the
read error on the failing drive. 
3) Decide to upgrade to 2.6.20
4) Raid5 issues occur (handling of read errors caused md device to die). 
5) Patch from Neil to fix raid-5 error handling
6) Replace failed drive and add a new drive at the same time to create a 4
drive PATA array.
7) Attempt to grow the array from 3 -> 4 devices which failed due to an error
similar to this:

ata3: command timeout
ata3: no sense translation for status: 0x40
ata3: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x40 { DriveReady }
sd 3:0:0:0: SCSI error: return code = 0x08000002
sdd: Current [descriptor]: sense key: Aborted Command
     Additional sense: No additional sense information
Descriptor sense data with sense descriptors (in hex):
         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
         00 00 00 00
end_request: I/O error, dev sdc, sector 260419647

8) Raid array is trashed, rebuild array and restore from backup.
9) From this point on the system is up and running - restored to working
state. However, I'm still getting errors similar to the above during array
accesses (read/write). Not related to load. The array (being synced) manages
to continue operation using another drive. My concern is that this may happen
on a degraded array in future.

Note that the error I'm getting (shown above) has happened on sdc and sdd and
at different sectors (i.e. not a consistent read error). Also, the SMART logs
for both drives show NO error at all, short and long SMART tests complete
successfully. I suspect this is an issue in the driver and/or my physical
TX4000 card.

If you could shed any light on this I would appreciate it.

Thanks.
Regards.

------------- BEGIN DMESG DUMP -----------------

Linux version 2.6.20 (root@xerces) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #2
SMP Mon Feb 12 09:28:29 GMT-9 2007 BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start: 0000000000000000 size: 000000000009c800 end:
000000000009c800 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000000009c800 size: 0000000000003800 end:
00000000000a0000 type: 2
copy_e820_map() start: 00000000000f0000 size: 0000000000010000 end:
0000000000100000 type: 2
copy_e820_map() start: 0000000000100000 size: 000000007feec000 end:
000000007ffec000 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000007ffec000 size: 0000000000003000 end:
000000007ffef000 type: 3
copy_e820_map() start: 000000007ffef000 size: 0000000000010000 end:
000000007ffff000 type: 2
copy_e820_map() start: 000000007ffff000 size: 0000000000001000 end:
0000000080000000 type: 4
copy_e820_map() start: 00000000fec00000 size: 0000000000001000 end:
00000000fec01000 type: 2
copy_e820_map() start: 00000000fee00000 size: 0000000000001000 end:
00000000fee01000 type: 2
copy_e820_map() start: 00000000ffff0000 size: 0000000000010000 end:
0000000100000000 type: 2
 BIOS-e820: 0000000000000000 - 000000000009c800 (usable)
 BIOS-e820: 000000000009c800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ffec000 (usable)
 BIOS-e820: 000000007ffec000 - 000000007ffef000 (ACPI data)
 BIOS-e820: 000000007ffef000 - 000000007ffff000 (reserved)
 BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 1151MB HIGHMEM
available.
896MB LOWMEM available.
found SMP MP-table at 000f7ea0
Entering add_active_range(0, 0, 524268) 0 entries of 256 used Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->   229376
  HighMem    229376 ->   524268
early_node_map[1] active PFN ranges
    0:        0 ->   524268
On node 0 totalpages: 524268
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 1760 pages used for memmap
  Normal zone: 223520 pages, LIFO batch:31
  HighMem zone: 2303 pages used for memmap
  HighMem zone: 292589 pages, LIFO batch:31 DMI 2.3 present.
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: ASUS     Product ID: PROD00000000 APIC at: 0xFEE00000
Processor #0 6:10 APIC version 16
Processor #1 6:10 APIC version 16
I/O APIC #2 Version 17 at 0xFEC00000.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 2
Allocating PCI resources starting at 88000000 (gap: 80000000:7ec00000)
Detected 2133.464 MHz processor.
Built 1 zonelists.  Total pages: 520173
Kernel command line: auto BOOT_IMAGE=Linux ro root=901 acpi=off pci=noacpi
elevator=as mapped APIC to ffffd000 (fee00000) mapped IOAPIC to ffffc000
(fec00000) Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x50
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache
hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 2072936k/2097072k available (1539k kernel code, 22916k reserved, 593k
data, 200k init, 1179568k highmem) virtual kernel memory layout:
    fixmap  : 0xfffa2000 - 0xfffff000   ( 372 kB)
    pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
    vmalloc : 0xf8800000 - 0xff7fe000   ( 111 MB)
    lowmem  : 0xc0000000 - 0xf8000000   ( 896 MB)
      .init : 0xc031b000 - 0xc034d000   ( 200 kB)
      .data : 0xc0280c62 - 0xc0315230   ( 593 kB)
      .text : 0xc0100000 - 0xc0280c62   (1539 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 4269.42 BogoMIPS
(lpj=2134710) Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0383fbff c1cbfbff 00000000 00000000
00000000 00000000 00000000
CPU: CLK_CTL MSR was 60031223. Reprogramming to 20031223
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After all inits, caps: 0383fbff c1cbfbff 00000000 00000420 00000000
00000000 00000000 Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to ffffe000.
Checking 'hlt' instruction... OK.
Freeing SMP alternatives: 10k freed
CPU0: AMD Athlon(TM) MP 2800+ stepping 00 Booting processor 1/1 eip 2000
Initializing CPU#1 Calibrating delay using timer specific routine.. 4266.31
BogoMIPS (lpj=2133156)
CPU: After generic identify, caps: 0383fbff c1cbfbff 00000000 00000000
00000000 00000000 00000000
CPU: CLK_CTL MSR was 60031223. Reprogramming to 20031223
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After all inits, caps: 0383fbff c1cbfbff 00000000 00000420 00000000
00000000 00000000 Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: AMD Athlon(TM) MP 2800+ stepping 00 Total of 2 processors activated
(8535.73 BogoMIPS).
ExtINT not setup in hardware but reported by MP table ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=0 pin2=0 checking TSC
synchronization across 2 CPUs: passed.
Brought up 2 CPUs
migration_cost=1084
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf1f30, last bus=2
PCI: Using configuration type 1
Setting up standard PCI resources
mtrr: your CPUs had inconsistent fixed MTRR settings
mtrr: probably your BIOS does not setup all CPUs.
mtrr: corrected configuration.
Linux Plug and Play Support v0.97 (c) Adam Belay
PnPBIOS: Scanning system for PnP BIOS support...
PnPBIOS: Found PnP BIOS installation structure at 0xc00fc5f0
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xc620, dseg 0xf0000
PnPBIOS: 13 nodes reported by PnP BIOS; 13 recorded by driver SCSI subsystem
initialized libata version 2.00 loaded.
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
Boot video device is 0000:01:05.0
PCI: Using IRQ router AMD768 [1022/7443] at 0000:00:07.3
PCI->APIC IRQ transform: 0000:00:08.0[A] -> IRQ 16 APIC IRQ transform: 
PCI->0000:00:09.0[A] -> IRQ 17 APIC IRQ transform: 0000:01:05.0[A] -> 
PCI->IRQ 16 APIC IRQ transform: 0000:02:04.0[A] -> IRQ 17 APIC IRQ 
PCI->transform: 0000:02:05.0[A] -> IRQ 18 APIC IRQ transform: 
PCI->0000:02:05.1[B] -> IRQ 19 APIC IRQ transform: 0000:02:05.2[C] -> 
PCI->IRQ 16 APIC IRQ transform: 0000:02:06.0[A] -> IRQ 17 APIC IRQ 
PCI->transform: 0000:02:08.0[A] -> IRQ 19
pnp: 00:0f: ioport range 0xe400-0xe47f has been reserved
pnp: 00:0f: ioport range 0xe4e0-0xe4ff has been reserved
PCI: Bridge: 0000:00:01.0
  IO window: disabled.
  MEM window: ee000000-efcfffff
  PREFETCH window: eff00000-fb7fffff
PCI: Bridge: 0000:00:10.0
  IO window: a000-afff
  MEM window: e8800000-ebffffff
  PREFETCH window: efd00000-efdfffff
PCI: Setting latency timer of device 0000:00:01.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP
established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash
table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536) TCP reno
registered checking if image is initramfs...it isn't (bad gzip magic numbers);
looks like an initrd Freeing initrd memory: 3072k freed Machine check
exception polling timer started.
highmem bounce pool size: 64 pages
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) io scheduler noop
registered io scheduler anticipatory registered (default) io scheduler
deadline registered io scheduler cfq registered BIOS failed to enable PCI
standards compliance, fixing this error.
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:03: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A RAMDISK driver initialized: 16
RAM disks of 8192K size 1024 blocksize
PNP: PS/2 Controller [PNP0303,PNP0f13] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice TCP cubic registered Starting
balanced_irq Using IPI Shortcut mode
input: AT Translated Set 2 keyboard as /class/input/input0
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 3072KiB [1 disk] into ram disk... 
VFS: Mounted root (cramfs filesystem) readonly.
Freeing unused kernel memory: 200k freed
NET: Registered protocol family 1
md: raid1 personality registered for level 1 Uniform Multi-Platform E-IDE
driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD7441: IDE controller at PCI slot 0000:00:07.1
AMD7441: chipset revision 4
AMD7441: not 100% native mode: will probe irqs later
AMD7441: 0000:00:07.1 (rev 04) UDMA100 controller
    ide0: BM-DMA at 0xd800-0xd807, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: hdc:DMA, hdd:DMA Probing IDE
interface ide0...
hda: WDC WD800BB-00JHC0, ATA DISK drive
hdb: WDC WD2500JB-00GVC0, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: WDC WD800BB-23DKA0, ATA DISK drive
hdd: HL-DT-STDVD-ROM GDR8163B, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 >
hdb: max request size: 512KiB
hdb: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63, UDMA(100)
hdb: cache flushes supported
 hdb: hdb1
hdc: max request size: 512KiB
hdc: 156312576 sectors (80032 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100)
hdc: cache flushes supported
 hdc: hdc1 hdc2 hdc3 hdc4 < hdc5 hdc6 hdc7 hdc8 >
md: md0 stopped.
md: bind<hda1>
md: bind<hdc1>
raid1: raid set md0 active with 2 out of 2 mirrors
md: md1 stopped.
md: bind<hda2>
md: bind<hdc2>
raid1: raid set md1 active with 2 out of 2 mirrors kjournald starting.  Commit
interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
hda: cache flushes supported
hdc: cache flushes supported
hdb: cache flushes supported
Adding 2007992k swap on /dev/md0.  Priority:-1 extents:1 across:2007992k
EXT3 FS on md1, internal journal
Real Time Clock Driver v1.12ac
hdd: ATAPI 52X DVD-ROM drive, 256kB Cache, UDMA(33) Uniform CD-ROM driver
Revision: 3.20
ieee1394: Initialized config rom entry `ip1394'
ieee1394: raw1394: /dev/raw1394 device initialized
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[17]  MMIO=[e9800000-e98007ff] 
Max Packet=[2048]  IR/IT contexts=[4/8]
video1394: Installed video1394 module
AMD768 RNG detected
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ohci_hcd 0000:02:05.0: OHCI Host Controller ohci_hcd 0000:02:05.0: new USB bus
registered, assigned bus number 1 ohci_hcd 0000:02:05.0: irq 18, io mem
0xeb000000 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB
hub found hub 1-0:1.0: 3 ports detected ohci_hcd 0000:02:05.1: OHCI Host
Controller ohci_hcd 0000:02:05.1: new USB bus registered, assigned bus number
2 ohci_hcd 0000:02:05.1: irq 19, io mem 0xea800000
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[005042f81010a4eb] usb usb2:
configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0:
2 ports detected
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
Intel(R) PRO/1000 Network Driver - version 7.3.15-k2 Copyright (c) 1999-2006
Intel Corporation.
e1000: 0000:00:09.0: e1000_probe: (PCI:66MHz:32-bit) 00:0e:0c:a0:04:dd
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection scsi0 : Adaptec
AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
        <Adaptec 2940 Ultra SCSI adapter>
        aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs

scsi 0:0:0:0: Sequential-Access SONY     SDX-500C         0101 PQ: 0 ANSI: 2
 target0:0:0: Beginning Domain Validation
 target0:0:0: wide asynchronous
 target0:0:0: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 8)
 target0:0:0: Domain Validation skipping write tests
 target0:0:0: Ending Domain Validation
sata_promise 0000:00:08.0: version 1.05
ata1: PATA max UDMA/133 cmd 0xF8AA6200 ctl 0xF8AA6238 bmdma 0x0 irq 16
ata2: PATA max UDMA/133 cmd 0xF8AA6280 ctl 0xF8AA62B8 bmdma 0x0 irq 16
ata3: PATA max UDMA/133 cmd 0xF8AA6300 ctl 0xF8AA6338 bmdma 0x0 irq 16
ata4: PATA max UDMA/133 cmd 0xF8AA6380 ctl 0xF8AA63B8 bmdma 0x0 irq 16
scsi1 : sata_promise
ata1.00: ATA-7, max UDMA/100, 312581808 sectors: LBA48
ata1.00: ata1: dev 0 multi count 0
ata1.00: configured for UDMA/100
scsi2 : sata_promise
ata2.00: ATA-7, max UDMA/100, 312581808 sectors: LBA48
ata2.00: ata2: dev 0 multi count 0
ata2.00: configured for UDMA/100
scsi3 : sata_promise
ata3.00: ATA-7, max UDMA/100, 312581808 sectors: LBA48
ata3.00: ata3: dev 0 multi count 0
ata3.00: configured for UDMA/100
scsi4 : sata_promise
ata4.00: ATA-6, max UDMA/100, 312581808 sectors: LBA48
ata4.00: ata4: dev 0 multi count 0
ata4.00: configured for UDMA/100
scsi 1:0:0:0: Direct-Access     ATA      WDC WD1600JB-00R 20.0 PQ: 0 ANSI: 5
scsi 2:0:0:0: Direct-Access     ATA      WDC WD1600JB-00R 20.0 PQ: 0 ANSI: 5
scsi 3:0:0:0: Direct-Access     ATA      WDC WD1600JB-00R 20.0 PQ: 0 ANSI: 5
scsi 4:0:0:0: Direct-Access     ATA      WDC WD1600JB-00E 15.0 PQ: 0 ANSI: 5
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@xxxxxxxxxx
md: md2 stopped.
md: bind<hda3>
md: bind<hdc3>
raid1: raid set md2 active with 2 out of 2 mirrors
md: md3 stopped.
md: bind<hda5>
md: bind<hdc5>
raid1: raid set md3 active with 2 out of 2 mirrors
md: md4 stopped.
md: bind<hda6>
md: bind<hdc6>
raid1: raid set md4 active with 2 out of 2 mirrors
md: md5 stopped.
md: bind<hda7>
md: bind<hdc7>
raid1: raid set md5 active with 2 out of 2 mirrors
md: md6 stopped.
SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sda: sda1
sd 1:0:0:0: Attached scsi disk sda
SCSI device sdb: 312581808 512-byte hdwr sectors (160042 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA SCSI device sdb: 312581808 512-byte hdwr sectors (160042 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sdb: sdb1
sd 2:0:0:0: Attached scsi disk sdb
SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sdc: sdc1
sd 3:0:0:0: Attached scsi disk sdc
SCSI device sdd: 312581808 512-byte hdwr sectors (160042 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA SCSI device sdd: 312581808 512-byte hdwr sectors (160042 MB)
sdd: Write Protect is off
sdd: Mode Sense: 00 3a 00 00
SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sdd: sdd1
sd 4:0:0:0: Attached scsi disk sdd
md: bind<sdb1>
md: bind<sdc1>
md: bind<sda1>
raid5: automatically using best checksumming function: pIII_sse
   pIII_sse  :  4928.000 MB/sec
raid5: using function: pIII_sse (4928.000 MB/sec)
raid6: int32x1    855 MB/s
raid6: int32x2   1156 MB/s
raid6: int32x4    730 MB/s
raid6: int32x8    648 MB/s
raid6: mmxx1     1781 MB/s
raid6: mmxx2     3265 MB/s
raid6: sse1x1     464 MB/s
raid6: sse1x2     929 MB/s
raid6: using algorithm sse1x2 (929 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: device sda1 operational as raid disk 0
raid5: device sdc1 operational as raid disk 2
raid5: device sdb1 operational as raid disk 1
raid5: allocated 4204kB for md6
raid5: raid level 5 set md6 active with 3 out of 4 devices, algorithm 2
RAID5 conf printout:
 --- rd:4 wd:3
 disk 0, o:1, dev:sda1
 disk 1, o:1, dev:sdb1
 disk 2, o:1, dev:sdc1
md: md7 stopped.
md: bind<hdc8>
md: bind<hda8>
raid1: raid set md7 active with 2 out of 2 mirrors
st: Version 20061107, fixed bufsize 32768, s/g segs 256 st 0:0:0:0: Attached
scsi tape st0 st 0:0:0:0: st0: try direct i/o: yes (alignment 512 B)
 target0:0:0: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 8)
st0: Block limits 2 - 16777215 bytes.
program stinit is using a deprecated SCSI ioctl, please convert it to SG_IO
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md6, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on md7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hdb1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
e1000: eth0: e1000_set_tso: TSO is Disabled
e1000: eth0: e1000_set_tso: TSO is Disabled
e1000: eth0: e1000_set_tso: TSO is Disabled process `syslogd' is using
obsolete setsockopt SO_BSDCOMPAT
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata1: no sense translation for status: 0x50
ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata1: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata2: no sense translation for status: 0x50
ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata2: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata3: no sense translation for status: 0x50
ata3: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
ata4: no sense translation for status: 0x50
ata4: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x50 { DriveReady SeekComplete }
st0: MTSETDRVBUFFER only allowed for root.
vmmon: module license 'unspecified' taints kernel.
/dev/vmmon[2331]: Module vmmon: registered with major=10 minor=165
/dev/vmmon[2331]: Module vmmon: initialized
/dev/vmnet: open called by PID 2366 (vmnet-bridge)
/dev/vmnet: hub 0 does not exist, allocating memory.
/dev/vmnet: port on hub 0 successfully opened
bridge-eth0: enabling the bridge
bridge-eth0: up
bridge-eth0: already up
bridge-eth0: attached
floppy0: no floppy controllers found
floppy0: no floppy controllers found
st 0:0:0:0: Attached scsi generic sg0 type 1 sd 1:0:0:0: Attached scsi generic
sg1 type 0 sd 2:0:0:0: Attached scsi generic sg2 type 0 sd 3:0:0:0: Attached
scsi generic sg3 type 0 sd 4:0:0:0: Attached scsi generic sg4 type 0
/dev/vmnet: open called by PID 2723 (vmware-vmx) device eth0 entered
promiscuous mode
bridge-eth0: enabled promiscuous mode
/dev/vmnet: port on hub 0 successfully opened
/dev/vmmon[2744]: host clock rate change request 0 -> 1001
/dev/vmnet: open called by PID 2972 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
md: bind<sdd1>
RAID5 conf printout:
 --- rd:4 wd:3
 disk 0, o:1, dev:sda1
 disk 1, o:1, dev:sdb1
 disk 2, o:1, dev:sdc1
 disk 3, o:1, dev:sdd1
md: recovery of RAID array md6
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for recovery.
md: using 128k window, over a total of 156288256 blocks.
md: md6: recovery done.
RAID5 conf printout:
 --- rd:4 wd:4
 disk 0, o:1, dev:sda1
 disk 1, o:1, dev:sdb1
 disk 2, o:1, dev:sdc1
 disk 3, o:1, dev:sdd1
/dev/vmnet: open called by PID 2989 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
/dev/vmnet: open called by PID 2989 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
/dev/vmmon[2744]: host clock rate change request 1001 -> 1002
/dev/vmmon[2744]: host clock rate change request 1002 -> 83
/dev/vmmon[2744]: host clock rate change request 83 -> 1001
/dev/vmmon[2744]: host clock rate change request 1001 -> 1002
/dev/vmmon[2744]: host clock rate change request 1002 -> 1001
/dev/vmnet: open called by PID 2988 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened
/dev/vmnet: open called by PID 2989 (vmware-vmx)
/dev/vmnet: port on hub 0 successfully opened kjournald starting.  Commit
interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ata3: command timeout
ata3: no sense translation for status: 0x40
ata3: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata3: status=0x40 { DriveReady }
sd 3:0:0:0: SCSI error: return code = 0x08000002
sdc: Current [descriptor]: sense key: Aborted Command
    Additional sense: No additional sense information Descriptor sense data
with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 00 00 01
end_request: I/O error, dev sdc, sector 260419647
raid5:md6: read error corrected (8 sectors at 260419584 on sdc1)
ata4: command timeout
ata4: no sense translation for status: 0x40
ata4: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x40 { DriveReady }
sd 4:0:0:0: SCSI error: return code = 0x08000002
sdd: Current [descriptor]: sense key: Aborted Command
    Additional sense: No additional sense information Descriptor sense data
with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 00 00 00
end_request: I/O error, dev sdd, sector 277596095


------------- END DMESG DUMP -------------


---------- Forwarded Message -----------
On Sun, 18 Feb 2007 07:13:28 -0500 (EST), Justin Piszcz wrote
> On Sun, 18 Feb 2007, Marc Marais wrote:
> 
> > On Sun, 18 Feb 2007 20:39:09 +1100, Neil Brown wrote
> >> On Sunday February 18, marcm@xxxxxxxxxxxxxxxx wrote:
> >>> Ok, I understand the risks which is why I did a full backup before 
doing
> >>> this. I have subsequently recreated the array and restored my data from
> >>> backup.
> >>
> >> Could you still please tell me exactly what kernel/mdadm version you
> >> were using?
> >>
> >> Thanks,
> >> NeilBrown
> >
> > 2.6.20 with the patch you supplied in response to the "md6_raid5 crash
> > email" I posted in linux-raid a few days ago. Just as background, I 
replaced
> > the failing drive and at the same time bought an additional drive in 
order
> > to increase the array size.
> >
> > mdadm -V = v2.6 - 21 December 2006. Compiled under Debian (stable).
> >
> > Also, I've just noticed another drive failure with the new array with a
> > similar error to what happened during the grow operation (although on a
> > different drive) - I wonder if I should post this to linux-ide?
> >
> > Feb 18 00:58:10 xerces kernel: ata4: command timeout
> > Feb 18 00:58:10 xerces kernel: ata4: no sense translation for status: 
0x40
> > Feb 18 00:58:10 xerces kernel: ata4: translated ATA stat/err 0x40/00 to 
SCSI
> > SK/ASC/ASCQ 0xb/00/00
> > Feb 18 00:58:10 xerces kernel: ata4: status=0x40 { DriveReady }
> > Feb 18 00:58:10 xerces kernel: sd 4:0:0:0: SCSI error: return code =
> > 0x08000002
> > Feb 18 00:58:10 xerces kernel: sdd: Current [descriptor]: sense key: 
Aborted
> > Command
> > Feb 18 00:58:10 xerces kernel:     Additional sense: No additional sense
> > information
> > Feb 18 00:58:10 xerces kernel: Descriptor sense data with sense 
descriptors
> > (in hex):
> > Feb 18 00:58:10 xerces kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 
00
> > 00 00 00 00
> > Feb 18 00:58:10 xerces kernel:         00 00 00 00
> > Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd, sector
> > 35666775
> > Feb 18 00:58:10 xerces kernel: raid5: Disk failure on sdd1, disabling
> > device. Operation continuing on 3 devices
> >
> > Regards,
> > Marc
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Just out of curiosity:
> 
> Feb 18 00:58:10 xerces kernel: end_request: I/O error, dev sdd,
>  sector 35666775
> 
> Can you run:
> 
> smartctl -d ata -t short /dev/sdd
> wait 5 min
> smartctl -d ata -t long /dev/sdd
> wait 2-3 hr
> smartctl -d ata -a /dev/sdd
> 
> And then e-mail that output to the list?
> 
> Justin.

I have smartmontools performing regular short and long scans but I will run 
the tests immediately and send the output of smartctl -a when done.

Note I'm getting similar errors on sdc too (as in 5 minutes ago). 
Interestingly the SMART error logs for sdc and sdd show no errors at all.

ata3: command timeout
ata3: no sense translation for status: 0x40
ata3: translated ATA stat/err 0x40/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata4: status=0x40 { DriveReady }
sd 3:0:0:0: SCSI error: return code = 0x08000002
sdd: Current [descriptor]: sense key: Aborted Command
     Additional sense: No additional sense information
Descriptor sense data with sense descriptors (in hex):
         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
         00 00 00 00
end_request: I/O error, dev sdc, sector 260419647
raid5:md6: read error corrected (8 sectors at 260419584 on sdc1)

Will post logs when done...

Marc

--
------- End of Forwarded Message -------


--
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux