On 12/6/06, Jonas Lundgren <jonas@xxxxxxxx> wrote:
Tejun Heo wrote: [--snip--] >> IF the system does recover, I start getting >> the extremly low disk write speeds that I reported above, and only a >> reboot will get the performance back to regular. > > Please full dmesg after your computer got really slow. I suspect libata > decided to switch to PIO mode. Here's the relevant part, if you want the whole dmesg look at: http://pastebin.ca/269581 [--snip--] [82048.255126] can't create port [85055.578172] reiser4[unrar(30787)]: disable_write_barrier (fs/reiser4/wander.c:234)[zam-1055]: [85055.578174] NOTICE: md5 does not support write barriers, using synchronous write instead. [87825.501998] can't create port [89520.019538] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [89520.019545] ata2.00: cmd c8/00:08:fe:68:df/00:00:00:00:00/e1 tag 0 data 4096 in [89520.019547] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [89520.322292] ata2: soft resetting port [89527.515891] ata2: port is slow to respond, please be patient (Status 0xd0) [89550.457913] ata2: port failed to respond (30 secs, Status 0xd0) [89550.457917] ata2: softreset failed (device not ready) [89550.457921] ata2: softreset failed, retrying in 5 secs [89555.454103] ata2: hard resetting port [89562.799693] ata2: port is slow to respond, please be patient (Status 0x80) [89585.740239] ata2: port failed to respond (30 secs, Status 0x80) [89585.740242] ata2: COMRESET failed (device not ready) [89585.740245] ata2: hardreset failed, retrying in 5 secs [89590.736978] ata2: hard resetting port [89598.081854] ata2: port is slow to respond, please be patient (Status 0x80) [89617.604742] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [89617.611034] ata2.00: configured for UDMA/100 [89617.611042] ata2: EH complete [89617.623426] SCSI device sdb: 145226112 512-byte hdwr sectors (74356 MB) [89617.633551] sdb: Write Protect is off [89617.633553] sdb: Mode Sense: 00 3a 00 00 [89617.637765] SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA > >> I don't know what causes it, but most of the times when I've gotten it >> my system has been under heavy load (compiling, downloading torrents in >> 11mb/sec etc). Please let me know if you want any additional info, want >> me to try something out, or whatever. My recent hardware upgrade for >> around $1200 (to a core2duo system, i965 mobo) is just going to waste >> because of this problem. :/ > > Heh, nice machine you got there. When you look at the dmesg, do the > error messages occur only on one of the two drives? Or are both > affected? If only one is affected, > > 1. swap the two. you'll probably have to dance a little bit with boot > loader but md should handle that fine once the kernel is loaded. does > the errors persist? on which device do they occur? do they follow the > drive or stay on the mobo port? It follows the drive. (Hardware problem?) > > 2. try different cable / port. if you change port, again, you need to > dance w/ boot loader. who's carrying the error messages with it? Read above. > > 3. try different power plug from different power lane. I've got a really good power supply, wich can handle max 560W on the +12 / -12 V rail alone. > >> I just got so glad when I saw the post of this on linux-ide, I've been >> searching like crazy to find another person having the same problem (and >> possibly a solution) for the past 2-3 weeks or so. > > My first guess is frequent transmission errors. Please report the test > results. Thanks. > I guess it could only be a hardware problem since the error follows the drive, and both the drives are identical, so it can't be a firmware problem. Correct me if I'm wrong. I just checked the smart status, and the drive passes, but it seems like it's going down though, on the other hand I might misread the results. smartctl -d ata -A /dev/sdb smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0007 113 111 021 Pre-fail Always - 4875 4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 237 5 Reallocated_Sector_Ct 0x0033 153 153 140 Pre-fail Always - 747 7 Seek_Error_Rate 0x000b 100 253 051 Pre-fail Always - 0 9 Power_On_Hours 0x0032 076 076 000 Old_age Always - 18117 10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 228 194 Temperature_Celsius 0x0022 117 108 000 Old_age Always - 33 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 639 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0009 200 179 051 Pre-fail Offline - 0 The "Reallocated_Sector_Ct" and "Reallocated_Event_Count" worries me.. Should I be worried?
Yes, they are a sign that the drive is wearing out! Andy
-- -Jonas Name: Jonas Lundgren ICQ#: 52064961 Mail: jonas@xxxxxxxx IRC: neon / neonman @ EFnet, Undernet, Quakenet, freenode - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html