Re: ahci sometimes fails to suspend controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday 04 August 2009, Tejun Heo wrote:
> Hello, Benjamin.
> 
> Benjamin S. wrote:
> >> Can you please attach full log?  I'm curious what exactly went down.
> > 
> > Sure. Do you think the system should still be able to resume although 
> > the revalidation failed while suspending (see line [299208.016116])?
> 
> Interesting.  This is the first time I see it failing this way.
> 
> [--snip--]
> > [299202.632167] ahci 0000:00:11.0: suspend
> > [299203.016052] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [299208.016032] ata3.00: qc timeout (cmd 0xec)
> > [299208.016078] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > [299208.016116] ata3.00: revalidation failed (errno=-5)
> 
> This shouldn't have happened.  The kernel is visiting each device and
> suspending it.  The process is ordered such that dependent devices
> always go to sleep first.  For some reason, something bad happens to
> the ATA controller while other parts of the system are going to sleep
> and I don't think it's solely software given the problem happens only
> after a lot of trials.
> 
> [--snip--]
> > [299249.128051] ata2: SATA link down (SStatus 0 SControl 300)
> > [299249.128117] ata4: SATA link down (SStatus 0 SControl 300)
> > [299249.128183] ata1: SATA link down (SStatus 0 SControl 300)
> > [299249.156033] sd 2:0:0:0: legacy resume
> > [299249.156037] sd 2:0:0:0: [sda] Starting disk
> > [299254.172018] ata3: link is slow to respond, please be patient (ready=0)
> > [299255.964034] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> 
> And it looks like the device could operate normally after resume.
> 
> The error messages are from SCSI layer which now realized that the ATA
> device is gone.
> 
> >>> Does that mean the SATA MSI quirk won't solve my problem?
> >> I think it's likely a different issue.  Can you please try to
> >> reproduce the problem and see how many tries it usually takes?
> > 
> > This time it were 79 successful resumes and the 80th one did not
> > succeed. 
> > 
> > Because I never shutdown my system I will reproduce it by force, 
> > but I am going to try to script a little bit to automatically
> > suspend and resume in order to get the next results faster.
> 
> Does irqpoll help?
> 
> cc'ing Rafael.  Rafael, is there any chance that we're suspending
> things in the wrong order?

If the kernel is older than 2.6.30, that may be a manifestation of the issue
described in http://www.sisk.pl/kernel/LS/2009/pci_resume/ .

Unfortunately, the patches that fixed it and went into 2.6.29 and 2.6.30
caused some suspend-resume regressions that are still unresolved, mostly on
powerpc.

I'd recomment trying 2.6.30.y (from kernel org) to see if the issue is still
there.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux