2012/3/17 Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>: > On Thu, Mar 15, 2012 at 05:19:15PM +0800, Elric Fu wrote: >> 2012/3/15 Xu, Andiry <Andiry.Xu@xxxxxxx>: >> >> 2012/3/15 Xu, Andiry <Andiry.Xu@xxxxxxx>: >> >> >> The Save/Restore state funtion of VIA xHCI host is unstable. So add >> >> >> the XHCI_RESET_ON_RESUME quirk for it. >> >> >> >> >> > >> >> > Can you provide more background of the issue? What's the unstable >> >> > behavior of VIA host after resume? >> >> >> >> The issue of using the Save/Restore state operation will cause the >> >> device attached to the port of host is disconnected. >> >> >> >> Sorry. My description about the issue is incorrect. If using the save/restore >> state function, after resume, the superspeed devices work fine but the usb >> 2.0 devices will be disconnected. It seems like the embedded 2.0 hub in >> VIA host controller can't be re-enumerated, so the 2.0 devices will be >> disconnected. I don't know why. >> >> > >> > Hmm. I think xHCI suspend path may need to be save DCBAA and all related >> > data structures before remove core well power. It's specified by xHCI >> > 1.0 errata. >> >> Well. Saving the DCBAA, Context and other data structures maybe fix this >> kind of issue. > > But the xHCI driver does save the DCBAA. It's in the xhci_hcd > structure. We restore the pointer to the dcbaa in > xhci_restore_registers(), which is called after a resume from suspend: > > static void xhci_restore_registers(struct xhci_hcd *xhci) > { > xhci_writel(xhci, xhci->s3.command, &xhci->op_regs->command); > xhci_writel(xhci, xhci->s3.dev_nt, &xhci->op_regs->dev_notification); > xhci_write_64(xhci, xhci->s3.dcbaa_ptr, &xhci->op_regs->dcbaa_ptr); > ... Sorry, I forgot that. But about this I have a question. The errata file specify the system software need to save a memory image of DCBAA, Contexts and other data structures. Why? May DCBAA and those data structures be corrupted? or other reasons? Driver have already kept the DCBAAP. I tested the save/restore state operation in VIA xHCI host. The superspeed device attached to the usb3 root hub works fine and the integrated 2.0 hub can't be re-enumerated. The address device command sent to integrated 2.0 hub was failed. That's a little stranger. The usb analyzer can't capture the trace between the root hub and integrated usb2.0 hub in VIA host. I need to discuss this with the FW team. > > We don't restore the register state after a hibernate, and I've been > wondering why. Andiry, I think you (or one of the AMD team) wrote the > suspend/resume code. Do you remember why we unconditionally do not > restore the registers on resume from hibernate? > > int xhci_resume(struct xhci_hcd *xhci, bool hibernated) > { > ... > if (!hibernated) { > /* restore register code */ > } > /* If restore operation fails, re-initialize the HC during resume */ > if ((temp & STS_SRE) || hibernated) { > } > > I know that power is completely removed during hibernate, so maybe we > just can't restore the registers? Heard from FW developer, if we reset and re-initialize the host the save and restore state function will be invalid. I think those states saved will be lost if we reset the host, and we have to reset the host as the power is removed. > > Now, I did look at the xHCI 1.0 spec errata from last June, and I think > we're saving and restoring the register state in the wrong order. > Patches to follow shortly. > >> >> >> When driver free Event Ring in the resume procedure, driver can't >> >> >> write zero to Event Ring Registers in VIA xHCI host, otherwise the >> >> >> VIA host can't be resumed. >> >> >> >> >> > >> >> > Strange. What do you mean by "can't resume"? Does xhci_resume() return >> >> > failure, or it returns success but host does not work after resume? >> >> >> >> The issue is the host will hang up. The reason is that the host has a >> >> hardware state machine, in the reset-on-resume procedure driver will >> >> reset the host first then free event ring and re-initialize the host. But >> >> if driver set 0 to event ring registers in xhci_mem_cleanup() after reset >> >> the host, the host will think the "0" is a vaild value, and fetch it. it >> >> result >> >> in the host hang up. >> >> >> > >> > xhci_halt() is called before xhci_reset(). Why does the host fetch event ring >> > registers after it's halted? >> >> I don't know. As far as I know, after reset the VIA host, the host >> will fetch the >> value of event ring register as long as the driver write those registers. Then >> after driver write 0 to those registers the host thinks the value is >> valid, maybe >> do something to event ring after fetching the value. I don't know the more >> details about it. >> >> >> > If the host needs a reset-on-resume, it will perform a complete >> >> > re-initialization. After xhci_mem_cleanup() called, xhci_resume() will >> >> > call xhci_init(), in which the ERST registers are re-written. Have you >> >> > checked the value, see if it's different form before suspend? >> >> >> >> I don't think in reset-on-resume write 0 to event Ring registers is >> >> necessary. >> >> As driver already reset the host before xhci_mem_cleanup(), the value of >> >> event ring registers will be 0. > > As Eric pointed out, the solution seems to just be to not write zeroes > to the DCBAA on memory clean up. After halt and reset, the xHCI host > DCBAA entry will be reset to its default (zero), so there's no reason to > write it again. If the host *isn't* halted because the stop command > write timed out, we really don't want to write anything to the DCBAA. > About this I talked with the FW developers. They said the chip had a hardware state machine and after the driver reset the host the state machine transitioned into a state (I don't know what state) and host will fetch the event ring registers as long as driver write those registers. So driver call the xhci_mem_cleanup() after xhci_reset(), the host will think 0 wrote by xhci_mem_cleanup() is valid. That result in the host is in trouble. But your opinion may be right. I need to test and verify it. >> >> > Also, note xhci_mem_cleanup is called not only by xhci_resume(), but also >> >> > by xhci_stop(). This means you may find the same issue by rmmod/modprobe >> >> > xhci-hcd. >> >> >> >> rmmod/modprobe xhci-hcd is ok. I have already tested it. I think it is >> >> ok if we write >> >> a valid value except for 0 to event ring registers in VIA xHCI host >> >> after reset the >> >> host. > > That's very strange. The shutdown code in the failed restore case is > nearly identical to the code in xhci_stop() that's called in the > shutdown case. The only difference is that xhci_halt() is called twice > in the shutdown case, so maybe the VIA host is not actually halted on > the failed register restore path? Maybe we need to increase > XHCI_MAX_HALT_USEC? > > Actually, the shutdown and hibernate/failed register restore paths > functions desperately need to be refactored to share code. It's > basically (but not quite) copy-paste between the two functions. For > instance, there's a call to usb_amd_dev_put() in xhci_stop(), but not in > the failed register restore/hibernate case. > > Andiry, is that the PLL shutdown behavior you wanted? > > Eric, I'm going to send you a couple of patches, and I want you to apply > them one at a time and try suspending and resuming (hopefully with > netconsole or something so you can capture log messages on resume). > > The first patch adds a warning to xhci_halt in case the host doesn't > actually halt. We should see if your failed resume is due to the host > not actually being halted when DCBAA is written. > > The second removes the writes of zero to the DCBAA and command ring > pointers in xhci_mem_cleanup(), which should remove the hang all > together. > > The third increases the xHCI halt timeout, and should only be applied if > you find the host isn't halting in time. > > The fourth changes the order of the register saves and restores to match > the xHCI 1.0 errata. That may make your host indicate the register > restore actually worked, which will mean the xHCI driver doesn't need to > re-initialize the host at all. OK. I will apply and test those patches. Thanks. Best Regards, Elric Fu > > Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html