Sorry for the very delayed response on this, I'm trying to catch up on my email from around Kernel Summit and my vacation. On Tue, Oct 08, 2013 at 10:19:11AM -0400, Alan Stern wrote: > On Mon, 7 Oct 2013, Sarah Sharp wrote: > > > > i am using 3.10 kernel. Also i looked at tip i see same implementation for > > > process_ctrl_td() > > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/host/xhci-ring.c > > > > Yeah, the implementation has always been this way. I've known since the > > beginning of writing the xHCI driver that it needs to handle zero-length > > data phases, but hadn't had ever had a report that it was required by a > > device under Linux. > > This code, near the end of process_ctrl_td(), looks a little strange: I really hate the xHCI ring code. :( I would rip it out and totally start over, but I'm afraid of introducing more low-level bugs than we already have. The problem with process_ctrl_td() is that it's trying to process events for all three phases of the control transfer with the same code, and it's a mess. Even I have to sit there with with the xHCI spec and map out all the different ways the code handles error conditions. > /* > * Did we transfer any data, despite the errors that might have > * happened? I.e. did we get past the setup stage? > */ > if (event_trb != ep_ring->dequeue) { > /* The event was for the status stage */ > if (event_trb == td->last_trb) { > if (td->urb->actual_length != 0) { > /* Don't overwrite a previously set error code > */ > if ((*status == -EINPROGRESS || *status == 0) && > (td->urb->transfer_flags > & URB_SHORT_NOT_OK)) > /* Did we already see a short data > * stage? */ > *status = -EREMOTEIO; > > If you already saw a short data stage, why isn't the status already set > to -EREMOTEIO? > > Also what's the reason for the test "td->urb->actual_length != 0"? > What does this have to do with getting a short data stage? Are you > assuming that at this point, actual_length will be nonzero if and only > if there was a short data stage? > > } else { > td->urb->actual_length = > td->urb->transfer_buffer_length; > } > > What's the purpose of this clause? > > It looks like the driver is confusing "actual_length == 0" with > "actual_length was never set". What if actual_length _was_ previously > set, but it was set to 0? Right, I understand what you're saying now. The code is trying to handle two separate cases in one function: 1. The setup, data, and status phases all complete successfully. The xHC only sends a successful completion event for the status phase. 2. The setup phase completes successfully, the data phase is short, and the status phase is successful. The xHC will generate a short completion event for the short data phase, and a successful completion event for the status phase. Here's the code: switch (trb_comp_code) { case COMP_SUCCESS: if (event_trb == ep_ring->dequeue) { xhci_warn(xhci, "WARN: Success on ctrl setup TRB " "without IOC set??\n"); *status = -ESHUTDOWN; } else if (event_trb != td->last_trb) { xhci_warn(xhci, "WARN: Success on ctrl data TRB " "without IOC set??\n"); *status = -ESHUTDOWN; } else { *status = 0; } break; case COMP_SHORT_TX: if (td->urb->transfer_flags & URB_SHORT_NOT_OK) *status = -EREMOTEIO; else *status = 0; break; ... if (event_trb != ep_ring->dequeue) { /* The event was for the status stage */ if (event_trb == td->last_trb) { if (td->urb->actual_length != 0) { /* Don't overwrite a previously set error code */ if ((*status == -EINPROGRESS || *status == 0) && (td->urb->transfer_flags & URB_SHORT_NOT_OK)) /* Did we already see a short data * stage? */ *status = -EREMOTEIO; } else { td->urb->actual_length = td->urb->transfer_buffer_length; } } else { /* Maybe the event was for the data stage? */ td->urb->actual_length = td->urb->transfer_buffer_length - EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)); xhci_dbg(xhci, "Waiting for status " "stage event\n"); return 0; } } For the simple case (successful event on status phase only), *status gets set to zero in the switch statement. We fall through to the if, and see that the event was for the status stage. We fall through to the else statement in this chunk of code, since td->urb->actual_length isn't set: if (td->urb->actual_length != 0) { /* Don't overwrite a previously set error code */ if ((*status == -EINPROGRESS || *status == 0) && (td->urb->transfer_flags & URB_SHORT_NOT_OK)) /* Did we already see a short data * stage? */ *status = -EREMOTEIO; } else { td->urb->actual_length = td->urb->transfer_buffer_length; } For case #2, on the short data phase, we set *status = -EREMOTEIO in the switch statement, but it has no impact on the overall code, because it's only used in the caller if the urb is being completed. Then we fall through to the data stage else statement and set urb->actual_length. Then we get the successful event for the status stage, which unconditionally overwrites urb->status to zero in the switch statement. We hit the status stage block in the lower code, and to differentiate between case #1 and case #2, we check if urb->actual_length is set. That obviously doesn't work if the data phase transfered zero bytes. So, yes, your analysis was correct. I just needed to stare at the code to get the state into my head. I guess the biggest confusion I had when I wrote this code was: what should the URB status and actual_length be if the data phase transferred successfully, but an error occurred on the status phase? I don't think the current code handles that very well for 0.95 xHCI hosts. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html