Re: [PATCH 2/2] xHCI: fix bug of resume for VIA xHCI host

Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> · Fri, 16 Mar 2012 13:08:32 -0700

On Thu, Mar 15, 2012 at 05:19:15PM +0800, Elric Fu wrote:
> 2012/3/15 Xu, Andiry <Andiry.Xu@xxxxxxx>:
> >> 2012/3/15 Xu, Andiry <Andiry.Xu@xxxxxxx>:
> >> >> The Save/Restore state funtion of VIA xHCI host is unstable. So add
> >> >> the XHCI_RESET_ON_RESUME quirk for it.
> >> >>
> >> >
> >> > Can you provide more background of the issue? What's the unstable
> >> > behavior of VIA host after resume?
> >>
> >> The issue of using the Save/Restore state operation will cause the
> >> device attached to the port of host is disconnected.
> >>
> 
> Sorry. My description about the issue is incorrect. If using the save/restore
> state function, after resume, the superspeed devices work fine but the usb
> 2.0 devices will be disconnected. It seems like the embedded 2.0 hub in
> VIA host controller can't be re-enumerated, so the 2.0 devices will be
> disconnected. I don't know why.
> 
> >
> > Hmm. I think xHCI suspend path may need to be save DCBAA and all related
> > data structures before remove core well power. It's specified by xHCI
> > 1.0 errata.
> 
> Well. Saving the DCBAA, Context and other data structures maybe fix this
> kind of issue.

But the xHCI driver does save the DCBAA.  It's in the xhci_hcd
structure.  We restore the pointer to the dcbaa in
xhci_restore_registers(), which is called after a resume from suspend:

static void xhci_restore_registers(struct xhci_hcd *xhci)
{
        xhci_writel(xhci, xhci->s3.command, &xhci->op_regs->command);
        xhci_writel(xhci, xhci->s3.dev_nt, &xhci->op_regs->dev_notification);
        xhci_write_64(xhci, xhci->s3.dcbaa_ptr, &xhci->op_regs->dcbaa_ptr);
...

We don't restore the register state after a hibernate, and I've been
wondering why.  Andiry, I think you (or one of the AMD team) wrote the
suspend/resume code.  Do you remember why we unconditionally do not
restore the registers on resume from hibernate?

int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
{
...
        if (!hibernated) {
		/* restore register code */
        }
        /* If restore operation fails, re-initialize the HC during resume */
        if ((temp & STS_SRE) || hibernated) {
	}

I know that power is completely removed during hibernate, so maybe we
just can't restore the registers?

Now, I did look at the xHCI 1.0 spec errata from last June, and I think
we're saving and restoring the register state in the wrong order.
Patches to follow shortly.

> >> >> When driver free Event Ring in the resume procedure, driver can't
> >> >> write zero to Event Ring Registers in VIA xHCI host, otherwise the
> >> >> VIA host can't be resumed.
> >> >>
> >> >
> >> > Strange. What do you mean by "can't resume"? Does xhci_resume() return
> >> > failure, or it returns success but host does not work after resume?
> >>
> >> The issue is the host will hang up. The reason is that the host has a
> >> hardware state machine, in the reset-on-resume procedure driver will
> >> reset the host first then free event ring and re-initialize the host. But
> >> if driver set 0 to event ring registers in xhci_mem_cleanup() after reset
> >> the host, the host will think the "0" is a vaild value, and fetch it. it
> >> result
> >> in the host hang up.
> >>
> >
> > xhci_halt() is called before xhci_reset(). Why does the host fetch event ring
> > registers after it's halted?
> 
> I don't know. As far as I know, after reset the VIA host,  the host
> will fetch the
> value of event ring register as long as the driver write those registers. Then
> after driver write 0 to those registers the host thinks the value is
> valid, maybe
> do something to event ring after fetching the value. I don't know the more
> details about it.
>
> >> > If the host needs a reset-on-resume, it will perform a complete
> >> > re-initialization. After xhci_mem_cleanup() called, xhci_resume() will
> >> > call xhci_init(), in which the ERST registers are re-written. Have you
> >> > checked the value, see if it's different form before suspend?
> >>
> >> I don't think in reset-on-resume write 0 to event Ring registers is
> >> necessary.
> >> As driver already reset the host before xhci_mem_cleanup(), the value of
> >> event ring registers will be 0.

As Eric pointed out, the solution seems to just be to not write zeroes
to the DCBAA on memory clean up.  After halt and reset, the xHCI host
DCBAA entry will be reset to its default (zero), so there's no reason to
write it again.  If the host *isn't* halted because the stop command
write timed out, we really don't want to write anything to the DCBAA.

> >> > Also, note xhci_mem_cleanup is called not only by xhci_resume(), but also
> >> > by xhci_stop(). This means you may find the same issue by rmmod/modprobe
> >> > xhci-hcd.
> >>
> >> rmmod/modprobe xhci-hcd is ok. I have already tested it. I think it is
> >> ok if we write
> >> a valid value except for 0 to event ring registers in VIA xHCI host
> >> after reset the
> >> host.

That's very strange.  The shutdown code in the failed restore case is
nearly identical to the code in xhci_stop() that's called in the
shutdown case.  The only difference is that xhci_halt() is called twice
in the shutdown case, so maybe the VIA host is not actually halted on
the failed register restore path?  Maybe we need to increase
XHCI_MAX_HALT_USEC?

Actually, the shutdown and hibernate/failed register restore paths
functions desperately need to be refactored to share code.  It's
basically (but not quite) copy-paste between the two functions.  For
instance, there's a call to usb_amd_dev_put() in xhci_stop(), but not in
the failed register restore/hibernate case.

Andiry, is that the PLL shutdown behavior you wanted?

Eric, I'm going to send you a couple of patches, and I want you to apply
them one at a time and try suspending and resuming (hopefully with
netconsole or something so you can capture log messages on resume).

The first patch adds a warning to xhci_halt in case the host doesn't
actually halt.  We should see if your failed resume is due to the host
not actually being halted when DCBAA is written.

The second removes the writes of zero to the DCBAA and command ring
pointers in xhci_mem_cleanup(), which should remove the hang all
together.

The third increases the xHCI halt timeout, and should only be applied if
you find the host isn't halting in time.

The fourth changes the order of the register saves and restores to match
the xHCI 1.0 errata.  That may make your host indicate the register
restore actually worked, which will mean the xHCI driver doesn't need to
re-initialize the host at all.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html