RE: [PATCH 0/4 v5] xHCI: PM patchset

"Xu, Andiry" <Andiry.Xu@xxxxxxx> · Thu, 29 Jul 2010 14:21:01 +0800

> -----Original Message-----
> From: Sarah Sharp [mailto:sarah.a.sharp@xxxxxxxxxxxxxxx]
> Sent: Thursday, July 29, 2010 12:34 PM
> To: Alan Stern
> Cc: Xu, Andiry; linux-usb@xxxxxxxxxxxxxxx; gregkh@xxxxxxx
> Subject: Re: [PATCH 0/4 v5] xHCI: PM patchset
> 
> On Wed, Jul 28, 2010 at 10:39:44AM -0400, Alan Stern wrote:
> > On Tue, 27 Jul 2010, Sarah Sharp wrote:
> >
> > > On Tue, Jul 27, 2010 at 11:14:39AM -0400, Alan Stern wrote:
> > > > On Tue, 27 Jul 2010, Xu, Andiry wrote:
> > > >
> > > > > > That's not quite right.  It tries to reset the device first.
> Only if
> > > > > > the reset fails does it free and re-initialize the device.
> > > > > >
> > > > >
> > > > > Yes, you are right, I didn't make it clearly. The hub driver
will
> try to
> > > > > reset the device first. After several failures, it will assume
the
> > > > > device is gone, then free and re-initialize the device.
> > > > >
> > > > > The hub driver will call hcd->driver->reset_device() to reset
the
> > > > > device. Reset Device is a xHCI command to put a device in
> addressed or
> > > > > configured state into default state. However since the xHC is
> reset, the
> > > > > reset command will fail - all the slots are freed and disabled
> after xHC
> > > > > reset. The command failure will show "cannot reset HCD device
> state"
> > > > > messages.
> > > >
> > > > Why isn't the device already in the addressed state?  Doesn't
the
> xHC
> > > > controller detect the device and initialize it right after the
xHC
> > > > reset completes?  Is the problem just that there wasn't enough
time?
> > >
> > > After the host controller is reset after the host resume fails,
all
> the
> > > state it has about devices is gone.  The xHCI host controller
needs to
> > > be told to re-address each device (basically to go through the
> > > enumeration sequence again so it can get a slot ID and address the
> > > device).
> > >
> > > If xhci_reset_device() is called for a device that the xHC host
> doesn't
> > > know about, the call will fail because the usb_device's slot_id
field
> is
> > > set to zero.  Now, xhci_reset_device() could be changed to return
> > > success if the slot_id is set to zero.  I'm not sure if that would
be
> > > good for other error cases.
> >
> > It looks like the problem is the special-case test in
hub_port_init().
> > It skips the reset if the device is new and is running at
SuperSpeed.
> > Clearly this is not the right criterion.
> 
> What I wanted to do was skip the port reset when a USB 3.0 device is
> first plugged into the system.  When the device undergoes link
training,
> it has already gone through a warm port reset, so there's no need to
> reset the device twice.  Now, that's just an optimization, and it
> doesn't really matter to me if USB 3.0 devices get reset during an
> initial connection.
> 
> I guess I don't understand what the issue is with the special case in
> hub_port_init().  I thought the real issue was that the xHCI host
> controller is has been reset and has lost its memory about any devices
> that the USB core has allocated.
> 

Hmm...I think so. xHCI driver allocates memory for each device to
reflect its state and other information. If xHC is reset, these memories
are still there, but their data is totally invalid. In the end these
memories will be freed and re-allocated when devices are re-initialized.

> > I think the test should be moved down into xhci-hcd, which knows
more
> > about the true situation, and it should be fixed.  What really
matters
> > is whether the hardware already knows about the device, not whether
the
> > device is new.  (In fact, it is possible to have new devices that
the
> > hardware _does_ know about, and it is possible to have old devices
that
> > the hardware _doesn't_ know about.)  What should happen, if the
device
> > isn't already known, is that the hardware should be told to discover
> > it.
> >
> > And finally, what about USB-2.0 devices?  Do they have the
restrictions
> > regarding resets?  I imagine not, since they already work under xHCI
> > with the current code.
> 
> USB 2.0 devices have a similar issue.  If the xHCI hardware doesn't
know
> about them (i.e. the usb_device structure is not allocated), then
> xhci_reset_device() will return an error status.
> 
> The initialization sequence from the xHCI driver's perspective is like
> this:  When usb_alloc_dev() is called and the device is under xHCI,
then
> xhci_alloc_dev() is called, and the device is issued a slot ID by the
> xHC host.  When hub_set_address() is called, xhci_address_device() is
> called.
> 
> xhci_address_device() sets up the device slot with a default control
> endpoint ring and some information about device speed and parent hub
is
> stored in the device context.  Then it issues a command to the xHCI
host
> controller to set the address for the device and add the control
> endpoint to the host controller's schedule.  Until this command
> completes, a call to xhci_reset_device() will fail.  This
initialization
> happens for all speeds of devices.
> 
> So I think the issue that Andiry is running into is that the xHCI host
> controller is reset, and it loses all the device context state.  (I'm
> not sure if it's leaking memory at that point, Andiry may want to turn
> on kmemleak to check.)  The USB core still thinks the device exists,
> since the usb_device structure is still present, and so it tries to
call
> xhci_reset_device() after the failed resume.
> 

I called xhci_free_virt_device() for all the devices after xHC is reset.
The command ring and event ring are not freed but will be
re-initialized. There should not be memory leak as I can see. If they
are not freed after xHC reset, USB core will free them later after reset
failure. The issue is USB core does not know xHC reset and it will do
regular steps which are useless: resume the device, try resetting the
device for several times, etc. 

> > > > > After check, I think I may found the difference between EHCI
and
> xHCI in
> > > > > device address choosing. In EHCI, the hub driver calls
> choose_address()
> > > > > to set the device address. Choose_address() uses
> find_next_zero_bit() to
> > > > > find an available address. So the devnum will increase every
time,
> > > > > because bus->devnum_next is increased every time. However,
xHCI
> does not
> > > > > call choose_address(). The USB device address is assigned by
H/W,
> upon a
> > > > > successful address device command. xHCI driver then set the
devnum
> to
> > > > > the value xHC returns, so it will always be the same value.
> > > >
> > > > I thought that might be the problem.  The kernel can't handle
> addresses
> > > > that are chosen by the hardware; it wants to choose the address
> itself.
> > > > xhci-hcd should set the address to the value specified by the
kernel
> > > > instead of telling the kernel what address the hardware used.
> > >
> > > So, you're saying that the xHCI driver should have an internal
field
> for
> > > the real device address, and allow the kernel to think it assigned
a
> > > different address?
> >
> > No.  I'm saying that xhci-hcd's address_device callback should
actually
> > change the device's address to the value assigned by the kernel.
> >
> > Hmm, the Wireless USB core also seems to have its own ideas about
> > addressing devices.  It wouldn't be surprising if they end up
causing
> > trouble too.
> 
> I don't have a way to ask the xHCI host controller to issue a
particular
> address to a device.  The problem is that the hardware supports
> per-device virtualization, and guests must not be allowed to pick an
> address that conflicts with another guest.  So the hardware picks the
> address during the initial call into the xHCI driver's
> xhci_address_device() in hub_set_address().
> 
> What's worse is that the hardware checks the control endpoint for a
> SetAddress command, and doesn't allow software to manually set the
> address for the device.  See section 4.6.5:
> 
> "If the xHC detects a SET_ADDRESS request on the Default Control
> Endpoint Transfer Ring, it shall generate a TRB Error Completion
Status
> for the TD. The xHC shall never forward a SET_ADDRESS request on a
> Default Control Endpoint Transfer Ring to a USB device."
> 
> But I think there's a deeper issue with the failed host controller
> resume.  I think we're seeing this char file conflict messages because
> there is a difference in what the USB core and the xHCI host think the
> device state is.
> 
> I think what needs to happen in that case is to tell the USB core that
> all devices under the host have disconnected, and then force them to
be
> re-enumerated.  Or even possibly re-initialize the host controller,
> since the event ring pointers and such may be lost.  Andiry, what do
you
> think?
> 

I guess so. Since device address is assigned by hardware, I'm not sure
if there will be any issues if kernel uses a different address. If USB
core can be notified about xHC reset, and free all the devices and then
re-enumerate them one by one, I think there will not be any conflicts.
The event ring and command ring are re-initialized after xHC reset in my
patch.

Thanks,
Andiry

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html