Re: usbcore and root-hub suspend/wakeup races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 02, 2012 at 02:31:15PM -0500, Alan Stern wrote:
> On Wed, 1 Feb 2012, Sarah Sharp wrote:
> 
> > Well, it's a moot point now, because looking at the xhci_bus_suspend
> > method, it has to issue several commands and wait for the host
> > controller to finish processing them before it can return, if there's a
> > port that's not suspended:
> 
> ...
> 
> > Is there any reason why the USB core would be suspending a bus if all
> > attached devices were not suspended?
> 
> Yes, as a matter of fact.  This happens during hibernation, in the
> PMSG_FREEZE stage.  The idea is that everything is supposed to be
> quiescent (no IRQs or DMA) but should remain at full power.  The USB
> core calls the suspend methods for all the device drivers, so all URBs
> should be cancelled, but it doesn't actually suspend the devices.  And
> it does call the bus_suspend methods for the HCDs, because the host
> controllers are what actually generate IRQs and do DMA.

Why doesn't it actually suspend the devices?  Because some of them might
react badly to suspend?

> >  I think this shouldn't be an issue
> > if we're disabling IRQs in xhci_bus_suspend(), and making sure that
> > we're not in the middle of a resume.
> 
> What exactly do you mean by "disabling IRQs"?  People tend to use terms 
> like that somewhat carelessly.  Do you mean disabling an IRQ line?  Or 
> disabling interrupts on some or all CPUs?  In general those sorts of 
> things should be avoided.

Eh, disreguard that.  I was thinking the two options were taking the
xHCI spinlock and disabling IRQs (which xhci_bus_suspend already does),
or disabling the MSI/MSI-X interrupt vectors all together (which Arjan
and HPA nixed).

> > As to your question about the INTE bit, I don't think I really want to
> > do that.  I think I have to stop the host (set the stop bit) to disable
> > interrupts that way.  From a quick search through the xHCI spec, it only
> > talks about turning that bit on during initialization.  I can't find any
> > text about what the host is supposed to do if that bit gets turned off
> > while it's running.
> > 
> > The reason I'm concerned about setting the INTE bit while the host is
> > running is because xhci_bus_suspend is can be called for one portion of
> > the roothub, while the other is still trying to run.  E.g. the USB 2.0
> > bus is being suspended while the USB 3.0 bus is still active.
> 
> That is a good cause for concern.
> 
> > However, both Arjan and HPA say that if the device tries to interrupt
> > while it's MSI-X vector is disabled, there's no guarantee that the xHCI
> > driver will get the interrupt later.  So disabling MSI or MSI-X vectors
> > doesn't seem to be a good solution.
> > 
> > Maybe I just need to look for the resume bits in the PCI suspend
> > function (xhci_suspend), and bail out there if they're set?
> 
> Well, there are two things to worry about.  The first is that you don't
> want xhci_bus_suspend to return successfully if wakeup is enabled and
> any ports have generated a wakeup event.  The second is that you don't
> want the interrupt handler routine to be modifying port settings
> concurrently with xhci_bus_suspend.

For the first concern, I think there will always be a race condition
between the xhci_bus_suspend and the port status interrupt handler.  If
the USB 2.0 bus is active, and we're trying to suspend the USB 3.0 bus,
we drop the xhci->lock at the end of bus suspend, and it's possible to
get an interrupt with a device-initiated resume port status change
before xhci_bus_suspend returns.  I don't see a way around that race.

Initially, I wondered if this race condition was actually an issue.  Say
we suspended both buses, but the interrupt for start of a
device-initiated resume crept in before we suspended the PCI device.
We'd clear the PLC bit, set the link state to U0 to stop the resume
signaling, and not kick the roothub, counting on the second PLC change
to generate an interrupt.  What if the PCI device is suspended then?

We would be basically suspending a host while it's in the middle of
reflecting device resume signaling, and I don't think there's any
guarantee we would get that second PLC bit.  So I think we do need to
prevent xhci_suspend from running if a remote wakeup has been signaled.


How about this:

xhci_bus_suspend takes the xhci->lock, checks whether any port is
resuming (by checking the USB 2.0 resume bits or the USB 3.0
remote_wakeup_bits), and fails if it finds that.  Then it sets a new
flag, xhci->bus_state->suspended.

When the port status handler grabs the xHCI lock and finds a USB 3.0
port has a device-initiated resume in progress, it sets the
remote_wakeup_bits for that port.  If it finds
xhci->bus_state->suspended is set, it attempts to kick the roothub.  If
not, it skips the kick, and acts exactly an external USB 3.0 hub that
does not send an interrupt transfer on device-initiated resume.

That may mean we kick the roothub when the USB 3.0 bus is suspended, but
the USB 2.0 bus is not (there is a per-roothub bus_state structure).
However, I think that's necessary to avoid having the USB core suspend
the USB 2.0 bus and PCI device before the end of the USB 3.0
device-initiated resume causes the PLC to be set.


There's still the synchronization between the interrupt handler and the
PCI suspend function that needs to be taken care of.  When we enter
xhci_suspend, there might be an interrupt in progress.  If it hasn't had
time to grab the xHCI spinlock, xhci_suspend will stop and suspend the
PCI device before the interrupt handler has a chance to run.

Table 10 lists when the xHC will issue a PME# due to a port event.
Unfortunately, it only lists an entry for when the port is in U3 and
resume signaling happens.  It doesn't list when the port is in Resume
and the host is suspended.  So we can't rely on the host controller
automatically bringing itself out of PCI suspend in this race condition.
(Although if I was a hardware engineer, I would signal a PME# if I saw
any resume signaling on a port.)

But the interrupt handler will run after xhci_suspend drops the lock.
Right now, it will discover the status register reads as 0xfffffff
(since the PCI device is suspended), think the HCD was removed, and call
usb_hcd_died().  It probably should be changed to check whether the PCI
device is suspended, and call usb_hcd_resume_root_hub() or maybe kick
the roothub?  I'm not quite sure what the right function to call is, but
basically I want the PCI device to be unsuspended.

I think this solution will be OK, because the device will still be
signaling the resume until we set the link state to U0.  So it will just
be signaling for a bit longer until we resume the PCI device.

It might be clearer if I just put this into code, but how does it sound?

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux