Re: 2.6.36: Dropped interrupts in ata_piix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am replying to the message here:
http://www.spinics.net/lists/linux-scsi/msg47723.html
(I just registered to the mailing list, so I manually added the Reply-To
header, hopefully it's correct)

It seems I have very similar symptoms with 2.6.36, I can reproduce them
easily in a live system, I could even grant shell access if necessary.
It didn't happen at all with stock ubuntu 8.04 LTS kernel
(2.6.32-25-generic-pae).

Currently, 1 out of 3 SATA hard drives are suffering from this.
I left a dd command overnight to read the drive contents to /dev/null.
One drive didn't finish while all the other drives finished with normal
speeds.
I have attached dmesg and smartctl messages.

Should I try and revert the mentioned patches, or something else?
Anyone able to provide direct links to these patches, I don't normally
involve in kernel development so I'm kind of new to this.

Here are my logs. SMART doesn't report any errors, but I see a lot of
errors in dmesg.

$ sudo smartctl -a /dev/sdc
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD20EADS-00S2B0
Serial Number:    WD-WCAVY0359972
Firmware Version: 04.05G04
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Nov 29 12:18:20 2010 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (43200) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   157   133   021    Pre-fail  Always       -       9108
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1198
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       11337
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       58
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       22
193 Load_Cycle_Count        0x0032   172   172   000    Old_age   Always       -       86640
194 Temperature_Celsius     0x0022   109   095   000    Old_age   Always       -       43
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       1279
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      5200         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

$ dmesg|tail -30
[51214.053097] ata4.00: failed command: READ DMA EXT
[51214.055504] ata4.00: cmd 25/00:00:bf:0d:38/00:02:94:00:00/e0 tag 0 dma 262144 in
[51214.055507]          res 40/00:00:00:00:00/84:01:09:00:00/00 Emask 0x24 (host bus error)
[51214.060435] ata4.00: status: { DRDY }
[51214.062824] ata4: soft resetting link
[51214.339049] ata4.00: configured for UDMA/33
[51214.339059] ata4.00: device reported invalid CHS sector 0
[51214.339117] ata4: EH complete
[51253.024028] ata4: lost interrupt (Status 0x51)
[51253.024050] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[51253.026504] ata4.00: BMDMA stat 0x26, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0
[51253.029131] ata4.00: failed command: READ DMA EXT
[51253.031523] ata4.00: cmd 25/00:00:bf:49:38/00:01:94:00:00/e0 tag 0 dma 131072 in
[51253.031526]          res 40/00:00:00:00:00/84:01:09:00:00/00 Emask 0x24 (host bus error)
[51253.036463] ata4.00: status: { DRDY }
[51253.038863] ata4: soft resetting link
[51253.339048] ata4.00: configured for UDMA/33
[51253.339055] ata4.00: device reported invalid CHS sector 0
[51253.339066] ata4: EH complete
[51288.008028] ata4: lost interrupt (Status 0x51)
[51288.008050] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[51288.010468] ata4.00: BMDMA stat 0x26, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0
[51288.013096] ata4.00: failed command: READ DMA EXT
[51288.015506] ata4.00: cmd 25/00:00:bf:5f:38/00:01:94:00:00/e0 tag 0 dma 131072 in
[51288.015509]          res 40/00:00:00:00:00/84:01:09:00:00/00 Emask 0x24 (host bus error)
[51288.020517] ata4.00: status: { DRDY }
[51288.022906] ata4: soft resetting link
[51288.339041] ata4.00: configured for UDMA/33
[51288.339049] ata4.00: device reported invalid CHS sector 0
[51288.339059] ata4: EH complete

My kernel is stock except for BFS scheduler patches. I don't see how that could cause this.

Best Regards
Mikko Korkalo


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux