On 01/05/2010 11:56 AM, Marco Bisetto wrote:
Hi, A problem with a "IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)" and two SAMSUNG HD103UJ sata hard disk drives. Disabling write cache on a disk gives error: kernel: [ 45.584445] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 kernel: [ 45.584445] ata1: SWNCQ:qc_active 0x1 defer_bits 0x0 last_issue_tag 0x0 kernel: [ 45.584445] dhfis 0x1 dmafis 0x1 sdbfis 0x0 kernel: [ 45.584445] ata1: ATA_REG 0x40 ERR_REG 0x0 kernel: [ 45.584445] ata1: tag : dhfis dmafis sdbfis sacitve kernel: [ 45.584445] ata1: tag 0x0: 1 1 0 1 kernel: [ 45.584445] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen kernel: [ 45.584445] ata1.00: cmd 61/08:00:3f:f4:e8/00:00:02:00:00/40 tag 0 ncq 4096 out kernel: [ 45.584445] res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) kernel: [ 45.584445] ata1.00: status: { DRDY } kernel: [ 45.584445] ata1: hard resetting link The error appears four times for each disk at startup, only when a disk has write cache disabled. For example, disabling write cache in two disks: 45.595788] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 45.595800] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 76.491877] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 76.511331] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 107.391075] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 107.423701] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 138.287465] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 138.332093] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 Disabling write cache on disk attached to ata4 and enabling it on disk attached to ata1: 45.583489] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 76.479940] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 107.375643] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 138.272023] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1 Enabling write cache on both disks = no errors. I don't think the problem can be associated with bad cables or power supply, as it happens in each channel, it is the same for each disk and happens at the same time. Anybody has ideas on what can it be and if there is a solution?
From what I can see, that debug output from sata_nv means that the drive hasn't reported it's completed the command (no SDB FIS) after the timeout (usually 30 seconds). That's an awfully long time. It could be that those drives have issues with NCQ and disabled write cache where some of the commands in the queue can be starved for overly long periods..
CCing linux-ide. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html