On Tuesday 04 August 2009, Tejun Heo wrote: > Hello, Benjamin. > > Benjamin S. wrote: > >> Can you please attach full log? I'm curious what exactly went down. > > > > Sure. Do you think the system should still be able to resume although > > the revalidation failed while suspending (see line [299208.016116])? > > Interesting. This is the first time I see it failing this way. > > [--snip--] > > [299202.632167] ahci 0000:00:11.0: suspend > > [299203.016052] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > > [299208.016032] ata3.00: qc timeout (cmd 0xec) > > [299208.016078] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4) > > [299208.016116] ata3.00: revalidation failed (errno=-5) > > This shouldn't have happened. The kernel is visiting each device and > suspending it. The process is ordered such that dependent devices > always go to sleep first. For some reason, something bad happens to > the ATA controller while other parts of the system are going to sleep > and I don't think it's solely software given the problem happens only > after a lot of trials. > > [--snip--] > > [299249.128051] ata2: SATA link down (SStatus 0 SControl 300) > > [299249.128117] ata4: SATA link down (SStatus 0 SControl 300) > > [299249.128183] ata1: SATA link down (SStatus 0 SControl 300) > > [299249.156033] sd 2:0:0:0: legacy resume > > [299249.156037] sd 2:0:0:0: [sda] Starting disk > > [299254.172018] ata3: link is slow to respond, please be patient (ready=0) > > [299255.964034] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > > And it looks like the device could operate normally after resume. > > The error messages are from SCSI layer which now realized that the ATA > device is gone. > > >>> Does that mean the SATA MSI quirk won't solve my problem? > >> I think it's likely a different issue. Can you please try to > >> reproduce the problem and see how many tries it usually takes? > > > > This time it were 79 successful resumes and the 80th one did not > > succeed. > > > > Because I never shutdown my system I will reproduce it by force, > > but I am going to try to script a little bit to automatically > > suspend and resume in order to get the next results faster. > > Does irqpoll help? > > cc'ing Rafael. Rafael, is there any chance that we're suspending > things in the wrong order? If the kernel is older than 2.6.30, that may be a manifestation of the issue described in http://www.sisk.pl/kernel/LS/2009/pci_resume/ . Unfortunately, the patches that fixed it and went into 2.6.29 and 2.6.30 caused some suspend-resume regressions that are still unresolved, mostly on powerpc. I'd recomment trying 2.6.30.y (from kernel org) to see if the issue is still there. Best, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html