RE: [PATCH 2/2] xHCI: fix bug of resume for VIA xHCI host

"Xu, Andiry" <Andiry.Xu@xxxxxxx> · Mon, 19 Mar 2012 02:40:00 +0000

> -----Original Message-----
> From: linux-usb-owner@xxxxxxxxxxxxxxx [mailto:linux-usb-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Sarah Sharp
> Sent: Saturday, March 17, 2012 4:09 AM
> To: Elric Fu
> Cc: Xu, Andiry; linux-usb@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 2/2] xHCI: fix bug of resume for VIA xHCI host
> 
> On Thu, Mar 15, 2012 at 05:19:15PM +0800, Elric Fu wrote:
> > 2012/3/15 Xu, Andiry <Andiry.Xu@xxxxxxx>:
> > >> 2012/3/15 Xu, Andiry <Andiry.Xu@xxxxxxx>:
> > >> >> The Save/Restore state funtion of VIA xHCI host is unstable. So add
> > >> >> the XHCI_RESET_ON_RESUME quirk for it.
> > >> >>
> > >> >
> > >> > Can you provide more background of the issue? What's the unstable
> > >> > behavior of VIA host after resume?
> > >>
> > >> The issue of using the Save/Restore state operation will cause the
> > >> device attached to the port of host is disconnected.
> > >>
> >
> > Sorry. My description about the issue is incorrect. If using the
> save/restore
> > state function, after resume, the superspeed devices work fine but the
> usb
> > 2.0 devices will be disconnected. It seems like the embedded 2.0 hub in
> > VIA host controller can't be re-enumerated, so the 2.0 devices will be
> > disconnected. I don't know why.
> >
> > >
> > > Hmm. I think xHCI suspend path may need to be save DCBAA and all
> related
> > > data structures before remove core well power. It's specified by xHCI
> > > 1.0 errata.
> >
> > Well. Saving the DCBAA, Context and other data structures maybe fix this
> > kind of issue.
> 
> But the xHCI driver does save the DCBAA.  It's in the xhci_hcd
> structure.  We restore the pointer to the dcbaa in
> xhci_restore_registers(), which is called after a resume from suspend:
> 
> static void xhci_restore_registers(struct xhci_hcd *xhci)
> {
>         xhci_writel(xhci, xhci->s3.command, &xhci->op_regs->command);
>         xhci_writel(xhci, xhci->s3.dev_nt, &xhci->op_regs-
> >dev_notification);
>         xhci_write_64(xhci, xhci->s3.dcbaa_ptr, &xhci->op_regs->dcbaa_ptr);
> ...
> 
> We don't restore the register state after a hibernate, and I've been
> wondering why.  Andiry, I think you (or one of the AMD team) wrote the
> suspend/resume code.  Do you remember why we unconditionally do not
> restore the registers on resume from hibernate?
> 

In my original design, xhci_resume() does not perform complete
cleanup/initialization in hibernation. It just reset the xHC, in that case
we need to restore the registers. I got information that Windows USB3.0 driver
is also doing this way. However, you pointed out that we need a complete
re-initialization, so I removed the registers restore part, because all
the related registers will be re-written during xhci_init().

> int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
> {
> ...
>         if (!hibernated) {
> 		/* restore register code */
>         }
>         /* If restore operation fails, re-initialize the HC during resume
> */
>         if ((temp & STS_SRE) || hibernated) {
> 	}
> 
> I know that power is completely removed during hibernate, so maybe we
> just can't restore the registers?
> 
> Now, I did look at the xHCI 1.0 spec errata from last June, and I think
> we're saving and restoring the register state in the wrong order.
> Patches to follow shortly.
> 
> > >> >> When driver free Event Ring in the resume procedure, driver can't
> > >> >> write zero to Event Ring Registers in VIA xHCI host, otherwise the
> > >> >> VIA host can't be resumed.
> > >> >>
> > >> >
> > >> > Strange. What do you mean by "can't resume"? Does xhci_resume()
> return
> > >> > failure, or it returns success but host does not work after resume?
> > >>
> > >> The issue is the host will hang up. The reason is that the host has a
> > >> hardware state machine, in the reset-on-resume procedure driver will
> > >> reset the host first then free event ring and re-initialize the host.
> But
> > >> if driver set 0 to event ring registers in xhci_mem_cleanup() after
> reset
> > >> the host, the host will think the "0" is a vaild value, and fetch it.
> it
> > >> result
> > >> in the host hang up.
> > >>
> > >
> > > xhci_halt() is called before xhci_reset(). Why does the host fetch
> event ring
> > > registers after it's halted?
> >
> > I don't know. As far as I know, after reset the VIA host,  the host
> > will fetch the
> > value of event ring register as long as the driver write those registers.
> Then
> > after driver write 0 to those registers the host thinks the value is
> > valid, maybe
> > do something to event ring after fetching the value. I don't know the
> more
> > details about it.
> >
> > >> > If the host needs a reset-on-resume, it will perform a complete
> > >> > re-initialization. After xhci_mem_cleanup() called, xhci_resume()
> will
> > >> > call xhci_init(), in which the ERST registers are re-written. Have
> you
> > >> > checked the value, see if it's different form before suspend?
> > >>
> > >> I don't think in reset-on-resume write 0 to event Ring registers is
> > >> necessary.
> > >> As driver already reset the host before xhci_mem_cleanup(), the value
> of
> > >> event ring registers will be 0.
> 
> As Eric pointed out, the solution seems to just be to not write zeroes
> to the DCBAA on memory clean up.  After halt and reset, the xHCI host
> DCBAA entry will be reset to its default (zero), so there's no reason to
> write it again.  If the host *isn't* halted because the stop command
> write timed out, we really don't want to write anything to the DCBAA.
> 
> > >> > Also, note xhci_mem_cleanup is called not only by xhci_resume(), but
> also
> > >> > by xhci_stop(). This means you may find the same issue by
> rmmod/modprobe
> > >> > xhci-hcd.
> > >>
> > >> rmmod/modprobe xhci-hcd is ok. I have already tested it. I think it is
> > >> ok if we write
> > >> a valid value except for 0 to event ring registers in VIA xHCI host
> > >> after reset the
> > >> host.
> 
> That's very strange.  The shutdown code in the failed restore case is
> nearly identical to the code in xhci_stop() that's called in the
> shutdown case.  The only difference is that xhci_halt() is called twice
> in the shutdown case, so maybe the VIA host is not actually halted on
> the failed register restore path?  Maybe we need to increase
> XHCI_MAX_HALT_USEC?
> 
> Actually, the shutdown and hibernate/failed register restore paths
> functions desperately need to be refactored to share code.  It's
> basically (but not quite) copy-paste between the two functions.  For
> instance, there's a call to usb_amd_dev_put() in xhci_stop(), but not in
> the failed register restore/hibernate case.
> 
> Andiry, is that the PLL shutdown behavior you wanted?
> 

Yes. Because the match of usb_amd_dev_put() is in xhci_pci_setup(),
not in xhci_init().

Thanks,
Andiry

> Eric, I'm going to send you a couple of patches, and I want you to apply
> them one at a time and try suspending and resuming (hopefully with
> netconsole or something so you can capture log messages on resume).
> 
> The first patch adds a warning to xhci_halt in case the host doesn't
> actually halt.  We should see if your failed resume is due to the host
> not actually being halted when DCBAA is written.
> 
> The second removes the writes of zero to the DCBAA and command ring
> pointers in xhci_mem_cleanup(), which should remove the hang all
> together.
> 
> The third increases the xHCI halt timeout, and should only be applied if
> you find the host isn't halting in time.
> 
> The fourth changes the order of the register saves and restores to match
> the xHCI 1.0 errata.  That may make your host indicate the register
> restore actually worked, which will mean the xHCI driver doesn't need to
> re-initialize the host at all.
> 
> Sarah Sharp
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html