Listmates (Tobias) I have a server that has 2 dmraid arrays (4 drives -> 2 arrays) that has been bullet-proof for years. A month ago (either coincidentally or due to a bug in the suse 11.2 kernel for client ssh/sftp sessions) I began experiencing sda errors on the array comprised of sda/sdc drives. The errors took the form of: Dec 5 20:48:48 nirvana sshd[30922]: error: ssh_msg_send: write Dec 5 20:49:10 nirvana sshd[30965]: Accepted keyboard-interactive/pam for legaleagle from 192.168.6.102 port 36 663 ssh2 Dec 5 20:50:12 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 5 20:50:12 nirvana kernel: ata3.00: BMDMA stat 0x25 Dec 5 20:50:12 nirvana kernel: ata3.00: cmd 25/00:08:33:0c:8c/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in Dec 5 20:50:12 nirvana kernel: res 51/40:00:39:0c:8c/40:00:34:00:00/e0 Emask 0x9 (media error) Dec 5 20:50:12 nirvana kernel: ata3.00: configured for UDMA/133 Dec 5 20:50:12 nirvana kernel: ata3: EH complete Dec 5 20:50:14 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 5 20:50:14 nirvana kernel: ata3.00: BMDMA stat 0x25 Dec 5 20:50:14 nirvana kernel: ata3.00: cmd 25/00:08:33:0c:8c/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in Dec 5 20:50:14 nirvana kernel: res 51/40:00:39:0c:8c/40:00:34:00:00/e0 Emask 0x9 (media error) Dec 5 20:50:14 nirvana kernel: ata3.00: configured for UDMA/133 Dec 5 20:50:14 nirvana kernel: ata3: EH complete Dec 5 20:50:16 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 5 20:50:16 nirvana kernel: ata3.00: BMDMA stat 0x25 Dec 5 20:50:16 nirvana kernel: ata3.00: cmd 25/00:08:33:0c:8c/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in Dec 5 20:50:23 nirvana kernel: res 51/40:00:39:0c:8c/40:00:34:00:00/e0 Emask 0x9 (media error) Dec 5 20:50:23 nirvana kernel: ata3.00: configured for UDMA/133 Dec 5 20:50:23 nirvana kernel: ata3: EH complete Dec 5 20:50:23 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 5 20:50:23 nirvana kernel: ata3.00: BMDMA stat 0x25 Dec 5 20:50:23 nirvana kernel: ata3.00: cmd 25/00:08:33:0c:8c/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in Dec 5 20:50:23 nirvana kernel: res 51/40:00:39:0c:8c/40:00:34:00:00/e0 Emask 0x9 (media error) Dec 5 20:50:23 nirvana kernel: ata3.00: configured for UDMA/133 Dec 5 20:50:23 nirvana kernel: ata3: EH complete Dec 5 20:50:23 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 5 20:50:23 nirvana kernel: ata3.00: BMDMA stat 0x25 Dec 5 20:50:23 nirvana kernel: ata3.00: cmd 25/00:08:33:0c:8c/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in Dec 5 20:50:23 nirvana kernel: res 51/40:00:39:0c:8c/40:00:34:00:00/e0 Emask 0x9 (media error) Dec 5 20:50:23 nirvana kernel: ata3.00: configured for UDMA/133 Dec 5 20:50:23 nirvana kernel: ata3: EH complete Dec 5 20:50:23 nirvana kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 5 20:50:23 nirvana kernel: ata3.00: BMDMA stat 0x25 Dec 5 20:50:23 nirvana kernel: ata3.00: cmd 25/00:08:33:0c:8c/00:00:34:00:00/e0 tag 0 cdb 0x0 data 4096 in Dec 5 20:50:23 nirvana kernel: res 51/40:00:39:0c:8c/40:00:34:00:00/e0 Emask 0x9 (media error) Dec 5 20:50:23 nirvana kernel: ata3.00: configured for UDMA/133 Dec 5 20:50:23 nirvana kernel: sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Dec 5 20:50:23 nirvana kernel: sd 2:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor] Dec 5 20:50:23 nirvana kernel: Descriptor sense data with sense descriptors (in hex): Dec 5 20:50:23 nirvana kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 5 20:50:23 nirvana kernel: 34 8c 0c 39 Dec 5 20:50:23 nirvana kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed Dec 5 20:50:23 nirvana kernel: end_request: I/O error, dev sda, sector 881593401 Dec 5 20:50:23 nirvana kernel: ata3: EH complete Dec 5 20:50:23 nirvana kernel: sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB) Dec 5 20:50:23 nirvana kernel: sd 2:0:0:0: [sda] Write Protect is off Dec 5 20:50:23 nirvana kernel: sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Dec 5 20:50:23 nirvana kernel: sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Booting from the install disk, assembling the arrays and fsck'ing -c -y the individual disks found a bad block on sda, corrected the problem and things were fine for a month. Then the same sda error appeared. Currently I have disabled dmraid and have the two disks running independently with zero errors. I'm keeping the contents mirrored with a cron job that basically does a 'cp -a / /mnt/sda' where sdc is the disk that is booted from and sda is mounted at /mnt/sda. This experience has raised a question that I can't find the answer to: "How does dmraid handle a bad block?" Also (hardware issues aside), is there anything from a kernel/drive controller standpoint that could be invoking the error? After running the disks independently for a week without a single error, I think I'll reassemble the array and rebuild it. Any other thoughts/suggestions? Obviously there was a bad block that developed on sda, but adding it to the bad block table fixed it. I have a pair of 750 G drives coming to replace this set, but I'm curious about the bad block handling with dmraid. If 1 bad block is enough to kill the array, then that's not a very robust system. The disks are both seagate 500G drives (ST3500641AS). The system hardware is: System Information Manufacturer: TYAN Computer Corp Product Name: S2865 BIOS Information Vendor: Phoenix Technologies, LTD Version: 6.00 PG Release Date: 06/20/2005 Address: 0xE0000 Runtime Size: 128 kB ROM Size: 512 kB Characteristics: Processor Information Socket Designation: Socket 939 Type: Central Processor Family: Athlon 64 Manufacturer: AMD ID: 32 0F 02 00 FF FB 8B 17 Signature: Family 15, Model 35, Stepping 2 All thoughts welcomed... -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com