Just adding some info here!
I added this to the bottom of ata_interrupt in libata-core.c which fixed the
problem:
if(!handled) {
printk("ata_interrupt nobody cared. Trying to clear irq src\n");
for (i = 0; i < host->n_ports; i++) {
struct ata_port *ap;
ap = host->ports[i];
ata_bmdma_irq_clear(ap);
}
handled = 1;
}
The result was that the above message comes 3 times in a row during resume,
then it silences and everything works. Also, I noticed that ata_host_intr is
not called in these cases, so when the interrupt reaches the driver after
the resume, it ignores it probably because it thinks it has no QC active
(correctly probably). Question is, where is the irq coming from then.
Obviously this is a horribly wrong fix, since if the interrupt is shared, we
will shadow the other interrupt so it never gets run (and corrupt our own
BM DMA operations).
A bit troubling that it seems to happen 3 times in a row, so anything simple
like clearing the BM IRQ status bit during the first stage of resume is not
enough perhaps (and I guess it already does that when re-initializing?).
For reference, I found this message from november about ICH7 and spurious BM
interrupts and a (not optimal) solution:
http://marc.theaimsgroup.com/?l=linux-ide&m=116373296023279&w=2
/Bjorn
On Thu, 4 Jan 2007, Bjorn Wesen wrote:
Hi folks,
There is only one thing keeping suspend-to-ram on my Sony Vaio SZ2 laptop
from working currently, and that is that most of the times I suspend/resume,
the SATA HD becomes blocked. Looking closer it is because an irq occurs
during resume, which reaches ata_interrupt, it does not handle it, and Linux
responds by blocking it permanently (the dreaded irq X: nobody cared).
The SZ2 has a Core Duo CPU, ICH7 and SATA is handled by the ata_piix driver.
No other PCI interrupts are mapped or enabled to the same interrupt.
I can't help thinking this looks like a race, because it resumes correctly
sometimes and then the HD works perfectly fine. Perhaps the SATA driver
hasn't recovered itself at the time the first irq occurs and thus it feels it
shouldn't handle it ?
I can't paste in the dmesg because the HD locks when it happens and thus it's
not written to it, but essentially, the "nobody cared" msg comes after the
extra CPU core is brought up, then the irq is disabled, then the ata driver
is trying to reconfigure the devices and then it locks up.
I can debug it but I'd need some pointers on where to start. For example, one
strategy could be to try forcing an acknowledge of the interrupt somehow in
the bottom of ata_interrupt if it feels it can't handle an interrupt (I've
only programmed embedded linux before not i386 so I don't know where to ack
such an irq - in the PCI bridge itself, or in the ATA device ? :).
Another strategy could be to find the reason why the interrupt is not handled
by enabling some debug or something which I haven't really looked into yet...
Any ideas ?
Regards,
Bjorn
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html