Re: [PATCH][RFC] xhci fixes

Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> · Mon, 16 Aug 2010 15:02:59 -0700

Hi Paul,

Thanks for sending this.  I'd really like you to break it up into a
couple of patches.  I have a couple questions, too.

First, Andiry already sent a patch to update the dequeue pointer in
process_isoc_td().  That just got added to the master branch a couple
days ago.  So you should take that bit of code out of this patch.

On Fri, Aug 13, 2010 at 06:48:44PM -0700, Paul Zimmerman wrote:
> Hi Sarah,
> 
> Here is the patch that we needed to make isoc transfers work reliably with the
> Synopsys xHCI controller.
> 
> Some high-speed USB 2.0 webcams did not work, because the code in
> xhci_endpoint_init() and xhci_get_max_esit_payload() was masking off bit 10 of
> the wMaxPacketSize value.
> 
> Occasionally the system would freeze when connecting or disconnecting a
> USB 2.0 webcam to the xHCI host, or when starting up or shutting down the
> webcam app (Cheese). We traced this to one of two issues:
> 
>  - Sometimes one of the BUGs in xhci_find_new_dequeue_state() would trigger.
>    To avoid locking up the system, we replaced the BUGs with WARN_ONs.

Those BUG() calls were there for a reason.  Does the system work without
them?

>  - Sometimes there would be an infinite loop in handle_tx_event() because
>    'event_seg' was NULL. There were also problems with the code possibly
>    using an uninitialized 'event_trb', and not resetting 'ret' to 0 on
>    subsequent passes thru the loop.

Can you break out the fixes for uninitialized variables into a separate
patch?

> The patch is against Linus' tree. The patch also has a couple of other fixes
> that were already submitted by other folks, but did not make it into 2.6.35.
> I left those in since this is just an RFC.
> 
> --- 
>  drivers/usb/host/xhci-mem.c  |    4 +-
>  drivers/usb/host/xhci-ring.c |   51 ++++++++++++++++++++++++++---------------
>  2 files changed, 34 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
> index 4e51343..bdf9056 100644
> --- a/drivers/usb/host/xhci-mem.c
> +++ b/drivers/usb/host/xhci-mem.c
> @@ -1043,7 +1043,7 @@ static inline u32 xhci_get_max_esit_payload(struct xhci_hcd *xhci,
>  	if (udev->speed == USB_SPEED_SUPER)
>  		return ep->ss_ep_comp.wBytesPerInterval;
>  
> -	max_packet = ep->desc.wMaxPacketSize & 0x3ff;
> +	max_packet = ep->desc.wMaxPacketSize & 0x7ff;
>  	max_burst = (ep->desc.wMaxPacketSize & 0x1800) >> 11;
>  	/* A 0 in max burst means 1 transfer per ESIT */
>  	return max_packet * (max_burst + 1);
> @@ -1133,7 +1133,7 @@ int xhci_endpoint_init(struct xhci_hcd *xhci,
>  		/* Fall through */
>  	case USB_SPEED_FULL:
>  	case USB_SPEED_LOW:
> -		max_packet = ep->desc.wMaxPacketSize & 0x3ff;
> +		max_packet = ep->desc.wMaxPacketSize & 0x7ff;
>  		ep_ctx->ep_info2 |= MAX_PACKET(max_packet);
>  		break;

Please break this into a separate patch.  I see that bits 10:0 are
supposed to be the maximum packet size, which should make a bitmask of
0x7ff, not 0x3ff.  Thanks for catching that.

>  	default:
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index bc3f4f4..d9d89df 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -131,7 +131,7 @@ static void next_trb(struct xhci_hcd *xhci,
>  		*seg = (*seg)->next;
>  		*trb = ((*seg)->trbs);
>  	} else {
> -		*trb = (*trb)++;
> +		(*trb)++;

This part of the patch is from John's compiler fix patch, correct?  It's
already been sent to Greg, but hasn't made it into the master branch yet
because Greg hasn't queued it.

>  	}
>  }
>  
> @@ -474,8 +474,11 @@ void xhci_find_new_dequeue_state(struct xhci_hcd *xhci,
>  	state->new_deq_seg = find_trb_seg(cur_td->start_seg,
>  			dev->eps[ep_index].stopped_trb,
>  			&state->new_cycle_state);
> -	if (!state->new_deq_seg)
> -		BUG();
> +	if (!state->new_deq_seg) {
> +		WARN_ON(1);
> +		return;
> +	}
> +

This is not a very good solution.  This function is called when the
driver wants to cancel a transfer, and the host is in the middle of the
transfer.  If you just return there, then the transfer will never get
taken off the hardware queue, and the hardware could potentially access
freed (or reused) memory.  Or the transfer will never get cancelled.
Really bad, either way.

I'd like to figure out why exactly find_trb_seg() is returning NULL.  Do
you have any ideas?  If not, I'll send you a debugging patch to get more
info.

>  	/* Dig out the cycle state saved by the xHC during the stop ep cmd */
>  	xhci_dbg(xhci, "Finding endpoint context\n");
>  	ep_ctx = xhci_get_ep_ctx(xhci, dev->out_ctx, ep_index);
> @@ -486,8 +489,10 @@ void xhci_find_new_dequeue_state(struct xhci_hcd *xhci,
>  	state->new_deq_seg = find_trb_seg(state->new_deq_seg,
>  			state->new_deq_ptr,
>  			&state->new_cycle_state);
> -	if (!state->new_deq_seg)
> -		BUG();
> +	if (!state->new_deq_seg) {
> +		WARN_ON(1);
> +		return;
> +	}
>  
>  	trb = &state->new_deq_ptr->generic;
>  	if ((trb->field[3] & TRB_TYPE_BITMASK) == TRB_TYPE(TRB_LINK) &&
> @@ -1551,6 +1556,10 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_td *td,
>  	/* calc actual length */
>  	if (ep->skip) {
>  		td->urb->iso_frame_desc[idx].actual_length = 0;
> +		/* Update ring dequeue pointer */
> +		while (ep_ring->dequeue != td->last_trb)
> +			inc_deq(xhci, ep_ring, false);
> +		inc_deq(xhci, ep_ring, false);
>  		return finish_td(xhci, td, event_trb, event, ep, status, true);
>  	}

As I said, this bit has been sent off to Greg.

> @@ -1824,6 +1833,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>  	}
>  
>  	do {
> +		ret = 0;
> +
>  		/* This TRB should be in the TD at the head of this ring's
>  		 * TD list.
>  		 */
> @@ -1839,7 +1850,6 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>  				xhci_dbg(xhci, "td_list is empty while skip "
>  						"flag set. Clear skip flag.\n");
>  			}
> -			ret = 0;
>  			goto cleanup;
>  		}

This bit should be in a separate patch.

> @@ -1874,20 +1884,23 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>  						"Skip it\n");
>  				goto cleanup;
>  			}
> -		}
>  
> -		/* Now update the urb's actual_length and give back to
> -		 * the core
> -		 */
> -		if (usb_endpoint_xfer_control(&td->urb->ep->desc))
> -			ret = process_ctrl_td(xhci, td, event_trb, event, ep,
> -						 &status);
> -		else if (usb_endpoint_xfer_isoc(&td->urb->ep->desc))
> -			ret = process_isoc_td(xhci, td, event_trb, event, ep,
> -						 &status);
> -		else
> -			ret = process_bulk_intr_td(xhci, td, event_trb, event,
> -						 ep, &status);
> +			/* Now update the urb's actual_length and give back to
> +			 * the core
> +			 */
> +			if (usb_endpoint_xfer_control(&td->urb->ep->desc))
> +				ret = process_ctrl_td(xhci, td, event_trb, event, ep,
> +							 &status);
> +			else if (usb_endpoint_xfer_isoc(&td->urb->ep->desc))
> +				ret = process_isoc_td(xhci, td, event_trb, event, ep,
> +							 &status);
> +			else
> +				ret = process_bulk_intr_td(xhci, td, event_trb, event,
> +							 ep, &status);
> +		} else {
> +			xhci_dbg(xhci, "event_seg is NULL\n");
> +			return -ENODEV;
> +		}

I assume you did this part of the patch because handle_tx_event() was
infinite looping.  I don't think it's a very good fix, and I'd rather
know the underlying cause for this.

The fix looks a bit...odd.  The only way the else statement would run is
if this statement is false:
	if (!event_seg &&
                   (!ep->skip ||
		   !usb_endpoint_xfer_isoc(&td->urb->ep->desc))) {

or
	if (!(!event_seg && (!ep->skip || !usb_endpoint_xfer_isoc(&td->urb->ep->desc))))
	if ((event_seg || !(!ep->skip || !usb_endpoint_xfer_isoc(&td->urb->ep->desc))))
	if ((event_seg || (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc))))

We know the event_seg is NULL, so the skip flag was set and this was an
isochronous endpoint.  I wonder if the event_seg was NULL because of the
lack of setting the dequeue pointer while processing the skipped TDs.

Does the infinite looping go away when you apply *just* the patch to fix
updating the dequeue pointer?  What about the issues with triggering the
BUG() calls while setting the TR dequeue pointer?

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html