Tejun Heo wrote:
Hello, all. This has been going on for quite some time now but I finally succeeded to reproduce the problem and find out what has been going on. It wasn't drive's or controller's fault. The spurious completion detection logic was wrong which makes all of this my fault. :-) The attached patch induces NCQ spurious completions by inserting artificial delays during irq handling. The following is log with the patch applied. A [ 1125.478813] ata35: MON issue=0x0 SAct=0x1 sactive=0x3 SDB FIS=004040a1:00000002 B [ 1125.480248] ata35: MON issue=0x4 SAct=0x6 sactive=0x7 SDB FIS=004040a1:00000001 C [ 1125.481614] ata35: MON issue=0x0 SAct=0x5 sactive=0x7 SDB FIS=004040a1:00000002 D [ 1125.481704] ata35: YYY 0x2 -> 0x4 E [ 1125.481722] ata35: XXX issue=0x0 SAct=0x1 sactive=0x1 SDB FIS=004040a1:00000004 F [ 1125.483087] ata35: MON issue=0x0 SAct=0x0 sactive=0x1 SDB FIS=004040a1:00000001 G [ 1125.484297] ata35: MON issue=0x4 SAct=0x6 sactive=0x7 SDB FIS=004040a1:00000001
Thanks a lot for tracking this down, and thanks even more for being humble enough to admit mistakes. More kernel hackers should follow your example.
I continue to be a proud mentor, watching you kick ass on the Linux kernel scene :)
Jeff - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html