This mail is trying to summarize a problem that seems to be ongoing for a number of mainline releases (at least for certain HW) and for which we would like some advise as to how to best approach diagnosis and fix. In order to reduce power usage we have been trying to make use of the SATA ALPM feature in various kernel releases. However this has resulted in reports [1] of users who see timeouts on SATA commands apparently triggered by link power state change, and disk corruption as a result. If recollection is right this happened on 2.6.31, 2.6.32, and 2.6.35 at least. The most recent example was a 2.6.35 based kernel running on a system with a Nvidia MCP67 AHCI controller [2] and a WD disk drive [3]. We are hoping that those working more closely with the SATA code might be aware of this issue. As the symptoms are so severe (data corruption) we have ALPM disabled globally, but this does make it hard to get more targeted information on affected platforms. As getting testing is tricky, we are keen to get some advise as to how we might better diagnose this issue should we be able to get some testing. We would also like to better understand what information is available and what valuable in such a diagnosis. Perhaps someone remembers fixing it (for some other hw). * Is this problem likely only related to the controller or may the drive have some influence as well? The diagnostics[4] sound a bit like the link fails to recover in a way it is supposed to. * Should the error message already show sufficient information or would there be additional debug data that is helpful and what would that be? Any advice appreciated. Should we file a bugzilla bug report to discuss this? Thanks. Stefan [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/539467 [2] 00:09.0 IDE interface [0101]: nVidia Corporation MCP67 AHCI Controller [10de:0550] (rev a2) (prog-if 85 [Master SecO PriO]) Subsystem: Acer Incorporated [ALI] Device [1025:0126] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 (750ns min, 250ns max) Interrupt: pin A routed to IRQ 23 Region 0: I/O ports at 30f0 [size=8] Region 1: I/O ports at 30e4 [size=4] Region 2: I/O ports at 30e8 [size=8] Region 3: I/O ports at 30e0 [size=4] Region 4: I/O ports at 30d0 [size=16] Region 5: Memory at d0884000 (32-bit, non-prefetchable) [size=8K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [8c] SATA HBA v1.0 InCfgSpace Capabilities: [b0] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [cc] HyperTransport: MSI Mapping Enable- Fixed+ Kernel driver in use: ahci Kernel modules: ahci [3] Model=WDC WD2500BEVS-22UST0, FwRev=01.01A01, SerialNo=WD-WXE108A79290 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=488397168 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=yes: unknown setting WriteCache=enabled Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7 [4] [12348.040077] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x150000 action 0x6 frozen [12348.040086] ata3: SError: { PHYRdyChg CommWake Dispar } [12348.040091] ata3.00: failed command: READ FPDMA QUEUED [12348.040099] ata3.00: cmd 60/10:00:b0:94:c5/00:00:03:00:00/40 tag 0 ncq 8192 in [12348.040101] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [12348.040104] ata3.00: status: { DRDY } [12348.040112] ata3: hard resetting link [12348.390082] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [12348.404414] ata3.00: configured for UDMA/133 [12348.404550] ata3.00: device reported invalid CHS sector 0 [12348.404570] ata3: EH complete -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html