RE: Issue with hub reset-resume under xHCI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Sarah Sharp [mailto:sarah.a.sharp@xxxxxxxxxxxxxxx]
> Sent: Saturday, January 15, 2011 6:00 AM
> To: Xu, Andiry
> Cc: linux-usb@xxxxxxxxxxxxxxx; Alan Stern
> Subject: Issue with hub reset-resume under xHCI
> 
> Hi Andiry,
> 
> I just wanted to give you a heads up about a potential bug I spent
some
> time tracking down last week.  It only showed up on the branch I've
been
> using for the split roothub patches, which is based on patches Greg
sent
> in for 2.6.38.  Those patches include the update to make the USB core
> use the runtime PM interface to suspend USB devices.  I haven't been
> able to reproduce this on a generic 2.6.37 kernel, so I suspect either
> the split roothub patches or the runtime suspend update revealed the
> bug.
> 
> Here's the patch I created against my branch.  The xHCI suspend/resume
> variables were moved around into a bus_state structure, but I think
you
> can get the general gist of it.  Can you look it over, and figure out
if
> this bug is possible in 2.6.38?  I'm still not sure if it was
something
> I introduced with the split roothub patches.  If it would show up in
> 2.6.38, I'll revise the patch against 2.6.38.
> 

I tried but have not reproduced this issue yet. So I will try to catch
something from the demsg log...

> 
>
8<-------------------------------------------------------------------->8
> From 0e3395514321b57ed185edc3fe75a39189bf738d Mon Sep 17 00:00:00 2001
> From: Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx>
> Date: Fri, 7 Jan 2011 11:13:00 -0800
> Subject: [PATCH] xhci: Clear internal resume state on reset.
> 
> The setup that displays this issue is a HS mult-tt hub plugged into
the
> xHCI roothub.  Sometimes when I quickly plug in a device into the HS
hub
> right after the hub (but not the bus) is suspended, the hub will
resume,
> and then there will be a transfer error on the GetStatus for the HS
hub.
> This causes the USB core to start a reset-resume, without first
checking
> the port status of that port with a GetPortStatus call into the
roothub.
> 
> The xHCI driver must time the port resume itself, and turn off resume
> signaling after a period of time.  It does that by keeping track of
the
> time to turn off resume signaling in an array called resume_done.  It
> relies on the GetPortStatus call by the USB core to check if needs to
> clear the resume bit in the port status and clear the time in
resume_done.
> When the USB core sees an error on the GetStatus control transfer to
the
> HS hub, it starts a reset-resume, which will immediately set the reset
bit
> in the roothub port status registers, without issuing a GetPortStatus.
> 
> This causes the time in resume_done to linger, despite the fact that
the
> device has been reset and is no longer resuming.  This will cause the
xHCI
> bus suspend functions to not allow the roothub to suspend, since a
device
> is resuming:
> 
> Jan  7 10:28:39 broadway kernel: [  965.325547] hub 1-1:1.0: state 7
ports
> 4 chg 0000 evt 0000
> Jan  7 10:28:41 broadway kernel: [  967.824026] hub 1-1:1.0:
hub_suspend
> Jan  7 10:28:41 broadway kernel: [  967.840013] usb 1-1: usb
auto-suspend
> Jan  7 10:28:43 broadway kernel: [  969.202878] Port Status Change
Event
> for port 3
> Jan  7 10:28:43 broadway kernel: [  969.202881] port resume event for
port
> 3
> Jan  7 10:28:43 broadway kernel: [  969.202884] resume HS port 3

I assume the port number 3 we got here is read directly from HW, which
combines USB3 ports and USB2 ports. Since you have split the hub and
assigned resume_done array to bus_state, the port number should be
transformed to "faked port number" in the driver. Maybe resume_done
array is set wrongly here.

(A little suggestion: clarify the port numbers in print messages.
Sometimes it's reported from HW, sometimes with base 1, and sometimes
with base 0. This causes confusion)

I see that you have a patch "Fix error in handle_port_status() on port
resume" on your branch to fix this port number error. Does the patch
helps on this issue? 

> Jan  7 10:28:43 broadway kernel: [  969.202899] hub 1-0:1.0: state 7
ports
> 2 chg 0000 evt 0002
> Jan  7 10:28:43 broadway kernel: [  969.202904] get port status,
actual
> port 0 status  = 0x400fe3

The port status indicates there is a Port Link State Change (U3->Resume)
and the resume signal is on.

> Jan  7 10:28:43 broadway kernel: [  969.216014] usb 1-1: usb
wakeup-resume
> Jan  7 10:28:43 broadway kernel: [  969.216019] get port status,
actual
> port 0 status  = 0xfe3

PLC bit is cleared. But the resume signal is still on.

> Jan  7 10:28:43 broadway kernel: [  969.216022] usb 1-1: finish resume

At this time xhci hub driver should have already cleared
resume_done[wIndex], and wrote 0 to PLS field.

> Jan  7 10:28:43 broadway kernel: [  969.216380] xhci_hcd 0000:01:00.0:
> WARN: transfer error on endpoint
> Jan  7 10:28:43 broadway kernel: [  969.216389] usb 1-1: retry with
reset-
> resume
> Jan  7 10:28:43 broadway kernel: [  969.266718] Port Status Change
Event
> for port 3
> Jan  7 10:28:43 broadway kernel: [  969.272012] get port status,
actual
> port 0 status  = 0x200e03

Here the port status indicates the port is in U0 state, and Port Reset
Change is set, so the port has been reset.

> Jan  7 10:28:43 broadway kernel: [  969.328299] usb 1-1: reset high
speed
> USB device using xhci_hcd and address 4
> ...
> Jan  7 10:28:47 broadway kernel: [  973.844013] hub 1-1:1.0:
hub_suspend
> Jan  7 10:28:47 broadway kernel: [  973.860015] usb 1-1: usb
auto-suspend
> Jan  7 10:28:49 broadway kernel: [  975.876023] hub 1-0:1.0:
hub_suspend
> Jan  7 10:28:49 broadway kernel: [  975.876030] usb usb1: bus
auto-suspend
> Jan  7 10:28:49 broadway kernel: [  975.876033] suspend failed because
> port 1 is resuming
> Jan  7 10:28:49 broadway kernel: [  975.876035] usb usb1: bus suspend
fail,
> err -16

Resume_done[0] is not zero, so bus suspend fails. It should be clear at
this moment.

I think Alan is right that resume_done should be already clear to 0
before the port reset happens. Something is wrong but I don't think it's
related to port reset. I think we should monitor the time resume_done is
set and clear, make sure it's set to the right port, and make sure
driver clears the resume signal in GetPortStatus. 

Thanks,
Andiry



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux