On Wed, Sep 12, 2012 at 03:36:39PM +0530, Ajay Gupta wrote: > Hi, > > I am using v3.5 kernel and running a test where I disconnect a SS MSC > device while a big file is being written to it. I am also watching the > lsusb output in parallel. Expectation is that SS MSC device should > immediately disappear from lsusb output but sometime I see that SS MSC > is listed for 30 seconds and then only disappears. Log is copied > below. > > Looking further into the issue shows that in failing case an URB is > not given back by XHCI driver and so SCSI layer unlinks it after > 30second of timeout. This URB was actually given to XHCI host > controller and a disconnect happens while packet transfer was in > progress. I looked at XHCI driver and could not find request being > aborted by driver (in xhci_free_dev() fn) before issuing disable slot > or reset device command. What is expectation in this scenario? The expectation is that the mass storage driver should have aborted any URBs in its disconnect function before xhci_free_dev() is called. I believe all USB drivers are required to do this. If that assumption isn't true, then, yes, I need to revisit any place that xHCI rings get freed. > Is it > XHCI driver responsibility to abort and giveback all the request (as > in EHCI using endpoint_disable()) > Why XHCI > driver doesn't have an endpoint_disable() implemented as done for > other controller? The xHCI driver just works differently than EHCI. It's not a requirement to implement the endpoint disable function, AFAIK. From what I remember, the endpoint disable function was supposed to be used before switching alternate interface settings, but it was called too late to be of much use to the xHCI bandwidth functions. So I created new bandwidth functions and ignored the endpoint disable functions. Maybe Alan can explain what the requirements about the endpoint disable function are? > OR host controller hardware should > abort and post an error completion event for each request? If the host controller hardware was in the middle of a transfer when the device was disconnected, and that caused the transfer to timeout, then, yes, the host should return an event with a transfer error and halt the event ring, as described in the xHCI 1.0 spec, section 4.10.2.3. Even if the host wasn't strictly speaking in the middle of a transfer during the disconnect, the endpoint ring should remain on the host's schedule as long as the xHCI driver didn't issue a stop endpoint command. I would expect that the transfer would timeout the next time it was executed from the schedule. Basically, the host controller has to play dumb and hand back events with error statuses for all TDs that continue to be schedulable after a device disconnect. Eventually the USB core will notice the disconnect, notify the driver, and it will cancel any outstanding URBs. > =============== Working case ==================== > [ 3406.676989] Port Status Change Event for port 2 > [ 3406.711530] get port status, actual port 1 status = 0x4202c0 > [ 3406.711531] Get port status returned 0x4102c0 > [ 3406.711629] clear port connect change, actual port 1 status = 0x4002c0 > [ 3406.711664] clear port link state change, actual port 1 status = 0x2c0 > [ 3406.721606] [E.f2e0b980.Transfer error on endpoint <=== URB > submitted here but gets transfer error > [ 3406.721742] Cleaning up stalled endpoint ring > [ 3406.721744] Finding segment containing stopped TRB. > [ 3406.721746] Finding endpoint context > [ 3406.721747] Finding segment containing last TRB in TD. > [ 3406.721749] Cycle state = 0x1 > [ 3406.721750] New dequeue segment = f2073880 (virtual) > [ 3406.721752] New dequeue pointer = 0x337be380 (DMA) > [ 3406.721754] Queueing new dequeue state > [ 3406.721756] Set TR Deq Ptr cmd, new deq seg = f2073880 (0x337be000 > dma), new deq ptr = f37be380 (0x337be380 dma), new cycle = 1 > [ 3406.721758]// Ding dong! > [ 3406.721855] Giveback URB f2e0b980, len = 0, expected = 31, status = > -71 <=== URB given back by XHCI driver > ================================================= > > > =============== Non Working case ================ > [ 2971.576389] Port Status Change Event for port 2 > [ 2971.576487] [E.f2d0c480. <=== URB submitted but no error. > [ 2971.585007] get port status, actual port 1 status = 0x4202c0 > [ 2971.585079] Get port status returned 0x4102c0 > [ 2971.585178] clear port connect change, actual port 1 status = 0x4002c0 > [ 2971.585213] clear port link state change, actual port 1 status = 0x2c0 > [ 2971.640030] get port status, actual port 1 status = 0x2d1 > [ 2971.640031] Get port status returned 0x2d1 > [ 2971.696029] get port status, actual port 1 status = 0x2d1 > [ 2971.696031] Get port status returned 0x2d1 > [ 2971.900031] get port status, actual port 1 status = 0x2d1 > [ 2971.900034] Get port status returned 0x2d1 > [ 2972.060480] Port Status Change Event for port 2 > [ 2972.104039] get port status, actual port 1 status = 0x2802a0 > [ 2972.104041] Get port status returned 0x3002a0 > [ 2972.104079] clear port reset change, actual port 1 status = 0x802a0 > [ 2972.104108] clear port warm(BH) reset change, actual port 1 status = 0x2a0 > [ 2972.104138] clear port link state change, actual port 1 status = 0x2a0 > [ 2972.104144] usb 6-2: USB disconnect, device number 6 > > <=== 30 seconds gap ==== > > > [ 3002.080058] Cancel URB f2d0c480, dev 2, ep 0x81, starting at offset > 0x337ad060 <== SCSI layer cancelling URB after 30 seconds > [ 3002.080063]// Ding dong! > [ 2972.060569] Stopped on Transfer TRB > [ 3002.080817] Removing canceled TD starting at 0x337ad060 (dma). > [ 3002.080820] Finding segment containing stopped TRB. > [ 3002.080822] Finding endpoint context > [ 3002.080823] Finding segment containing last TRB in TD. > [ 3002.080825] Cycle state = 0x0 > [ 3002.080827] New dequeue segment = f71628f0 (virtual) > [ 3002.080828] New dequeue pointer = 0x337ad070 (DMA) > [ 3002.080831] Set TR Deq Ptr cmd, new deq seg = f71628f0 (0x337ad000 > dma), new deq ptr = f37ad070 (0x337ad070 dma), new cycle = 0 According to that, the rings are still in tact and we can issue a stop endpoint ring command successfully. Are you sure xhci_free_dev() was called before the URB was canceled? Have you tried adding some printk warnings in xhci_free_dev() if the td_list for any endpoint is not empty? We can certainly change the driver to stop the endpoints and give back any URBs in xhci_free_dev(), but I want to make sure we understand what's actually going on in the scenario you describe. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html