On Thu, Mar 15, 2012 at 05:19:15PM +0800, Elric Fu wrote: > 2012/3/15 Xu, Andiry <Andiry.Xu@xxxxxxx>: > >> 2012/3/15 Xu, Andiry <Andiry.Xu@xxxxxxx>: > >> >> The Save/Restore state funtion of VIA xHCI host is unstable. So add > >> >> the XHCI_RESET_ON_RESUME quirk for it. > >> >> > >> > > >> > Can you provide more background of the issue? What's the unstable > >> > behavior of VIA host after resume? > >> > >> The issue of using the Save/Restore state operation will cause the > >> device attached to the port of host is disconnected. > >> > > Sorry. My description about the issue is incorrect. If using the save/restore > state function, after resume, the superspeed devices work fine but the usb > 2.0 devices will be disconnected. It seems like the embedded 2.0 hub in > VIA host controller can't be re-enumerated, so the 2.0 devices will be > disconnected. I don't know why. > > > > > Hmm. I think xHCI suspend path may need to be save DCBAA and all related > > data structures before remove core well power. It's specified by xHCI > > 1.0 errata. > > Well. Saving the DCBAA, Context and other data structures maybe fix this > kind of issue. But the xHCI driver does save the DCBAA. It's in the xhci_hcd structure. We restore the pointer to the dcbaa in xhci_restore_registers(), which is called after a resume from suspend: static void xhci_restore_registers(struct xhci_hcd *xhci) { xhci_writel(xhci, xhci->s3.command, &xhci->op_regs->command); xhci_writel(xhci, xhci->s3.dev_nt, &xhci->op_regs->dev_notification); xhci_write_64(xhci, xhci->s3.dcbaa_ptr, &xhci->op_regs->dcbaa_ptr); ... We don't restore the register state after a hibernate, and I've been wondering why. Andiry, I think you (or one of the AMD team) wrote the suspend/resume code. Do you remember why we unconditionally do not restore the registers on resume from hibernate? int xhci_resume(struct xhci_hcd *xhci, bool hibernated) { ... if (!hibernated) { /* restore register code */ } /* If restore operation fails, re-initialize the HC during resume */ if ((temp & STS_SRE) || hibernated) { } I know that power is completely removed during hibernate, so maybe we just can't restore the registers? Now, I did look at the xHCI 1.0 spec errata from last June, and I think we're saving and restoring the register state in the wrong order. Patches to follow shortly. > >> >> When driver free Event Ring in the resume procedure, driver can't > >> >> write zero to Event Ring Registers in VIA xHCI host, otherwise the > >> >> VIA host can't be resumed. > >> >> > >> > > >> > Strange. What do you mean by "can't resume"? Does xhci_resume() return > >> > failure, or it returns success but host does not work after resume? > >> > >> The issue is the host will hang up. The reason is that the host has a > >> hardware state machine, in the reset-on-resume procedure driver will > >> reset the host first then free event ring and re-initialize the host. But > >> if driver set 0 to event ring registers in xhci_mem_cleanup() after reset > >> the host, the host will think the "0" is a vaild value, and fetch it. it > >> result > >> in the host hang up. > >> > > > > xhci_halt() is called before xhci_reset(). Why does the host fetch event ring > > registers after it's halted? > > I don't know. As far as I know, after reset the VIA host, the host > will fetch the > value of event ring register as long as the driver write those registers. Then > after driver write 0 to those registers the host thinks the value is > valid, maybe > do something to event ring after fetching the value. I don't know the more > details about it. > > >> > If the host needs a reset-on-resume, it will perform a complete > >> > re-initialization. After xhci_mem_cleanup() called, xhci_resume() will > >> > call xhci_init(), in which the ERST registers are re-written. Have you > >> > checked the value, see if it's different form before suspend? > >> > >> I don't think in reset-on-resume write 0 to event Ring registers is > >> necessary. > >> As driver already reset the host before xhci_mem_cleanup(), the value of > >> event ring registers will be 0. As Eric pointed out, the solution seems to just be to not write zeroes to the DCBAA on memory clean up. After halt and reset, the xHCI host DCBAA entry will be reset to its default (zero), so there's no reason to write it again. If the host *isn't* halted because the stop command write timed out, we really don't want to write anything to the DCBAA. > >> > Also, note xhci_mem_cleanup is called not only by xhci_resume(), but also > >> > by xhci_stop(). This means you may find the same issue by rmmod/modprobe > >> > xhci-hcd. > >> > >> rmmod/modprobe xhci-hcd is ok. I have already tested it. I think it is > >> ok if we write > >> a valid value except for 0 to event ring registers in VIA xHCI host > >> after reset the > >> host. That's very strange. The shutdown code in the failed restore case is nearly identical to the code in xhci_stop() that's called in the shutdown case. The only difference is that xhci_halt() is called twice in the shutdown case, so maybe the VIA host is not actually halted on the failed register restore path? Maybe we need to increase XHCI_MAX_HALT_USEC? Actually, the shutdown and hibernate/failed register restore paths functions desperately need to be refactored to share code. It's basically (but not quite) copy-paste between the two functions. For instance, there's a call to usb_amd_dev_put() in xhci_stop(), but not in the failed register restore/hibernate case. Andiry, is that the PLL shutdown behavior you wanted? Eric, I'm going to send you a couple of patches, and I want you to apply them one at a time and try suspending and resuming (hopefully with netconsole or something so you can capture log messages on resume). The first patch adds a warning to xhci_halt in case the host doesn't actually halt. We should see if your failed resume is due to the host not actually being halted when DCBAA is written. The second removes the writes of zero to the DCBAA and command ring pointers in xhci_mem_cleanup(), which should remove the hang all together. The third increases the xHCI halt timeout, and should only be applied if you find the host isn't halting in time. The fourth changes the order of the register saves and restores to match the xHCI 1.0 errata. That may make your host indicate the register restore actually worked, which will mean the xHCI driver doesn't need to re-initialize the host at all. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html