Re: [PATCH][RFC] xhci fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andiry,

I have a couple questions about your isoc patches.  I think the do-while
loop in handle_tx_event() when a missed service interval is encountered
is trying to walk the event ring to find the next isoc TD that was
actually completed.  It doesn't touch the event ring dequeue pointer
(except to move it past the missed service interval event).

Now, what if the next event after a missed service interval isn't for
that endpoint?  What if you get two missed service intervals for the
same endpoint in a row?  Or a missed service interval followed by a Ring
Underrun/overrun event for that isoc ring?  I think the code may have
bugs in those cases, although I'm not sure if those are the exact bugs
that Paul is running into.  (Paul, I'll send you a debugging patch
shortly.)

I think the while condition at the end of the do-while loop is
incorrect:
        } while (ep->skip && trb_comp_code != COMP_MISSED_INT);

trb_comp_code is only assigned once before the do-while loop, so it will
never change to break out of the loop once we hit a missed service
interval event.  Only the clearing of ep->skip can make the loop stop.
But if the next event after the missed service interval isn't a transfer
event for that endpoint ring, then the loop will run indefinitely.

Now, Paul's patch attempted to break out of the loop and return when
handle_tx_event() found a null event_seg when ep->skip was set for an
isochronous endpoint.  It's possible the event ring and the endpoint
ring eventually synched up, and the isoc TDs that were skipped when the
handle_tx_event() function originally returned were never given back to
the driver.  That might explain the null dequeue segment pointer after
the driver attempted to cancel the isochronous URBs.  I'm still looking
at the code to figure out if it's possible though.

Sarah Sharp

On Mon, Aug 16, 2010 at 06:59:49PM -0700, Paul Zimmerman wrote:
> > From: Sarah Sharp [mailto:sarah.a.sharp@xxxxxxxxxxxxxxx]
> 
> > First, Andiry already sent a patch to update the dequeue pointer in
> > process_isoc_td().  That just got added to the master branch a couple
> > days ago.  So you should take that bit of code out of this patch.
> 
> Will do.
> 
> > Those BUG() calls were there for a reason.  Does the system work without
> > them?
> 
> Yes, it works without them, at least with the Synopsys controller. I also tested
> a bit with the NEC controller and didn't see any problems.
> 
> According to Linus, drivers should do WARN_ON() instead of BUG() in interrupt
> context, unless it's absolutely impossible to continue. Sorry, I don't have a
> reference to that email handy.
> 
> > Can you break out the fixes for uninitialized variables into a separate
> > patch?
> 
> Will do.
> 
> > Please break this into a separate patch.  I see that bits 10:0 are
> > supposed to be the maximum packet size, which should make a bitmask of
> > 0x7ff, not 0x3ff.  Thanks for catching that.
> 
> Will do.
> 
> > > @@ -131,7 +131,7 @@ static void next_trb(struct xhci_hcd *xhci,
> > >  		*seg = (*seg)->next;
> > >  		*trb = ((*seg)->trbs);
> > >  	} else {
> > > -		*trb = (*trb)++;
> > > +		(*trb)++;
> >
> > This part of the patch is from John's compiler fix patch, correct?  It's
> > already been sent to Greg, but hasn't made it into the master branch yet
> > because Greg hasn't queued it.
> 
> Yeah, I'll drop that too. By the way, that patch should probably go to
> -stable too, since it's undefined behavior in C, if I'm not mistaken.
> 
> > > @@ -474,8 +474,11 @@ void xhci_find_new_dequeue_state(struct xhci_hcd *xhci,
> > >  	state->new_deq_seg = find_trb_seg(cur_td->start_seg,
> > >  			dev->eps[ep_index].stopped_trb,
> > >  			&state->new_cycle_state);
> > > -	if (!state->new_deq_seg)
> > > -		BUG();
> > > +	if (!state->new_deq_seg) {
> > > +		WARN_ON(1);
> > > +		return;
> > > +	}
> > > +
> > 
> > This is not a very good solution.  This function is called when the
> > driver wants to cancel a transfer, and the host is in the middle of the
> > transfer.  If you just return there, then the transfer will never get
> > taken off the hardware queue, and the hardware could potentially access
> > freed (or reused) memory.  Or the transfer will never get cancelled.
> > Really bad, either way.
> > 
> > I'd like to figure out why exactly find_trb_seg() is returning NULL.  Do
> > you have any ideas?  If not, I'll send you a debugging patch to get more
> > info.
> 
> Sorry, I don't have any ideas. That part of the xhci driver is black magic
> to me. A debugging patch would be a good idea, I think.
> 
> > > @@ -1551,6 +1556,10 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_td *td,
> > >  	/* calc actual length */
> > >  	if (ep->skip) {
> > >  		td->urb->iso_frame_desc[idx].actual_length = 0;
> > > +		/* Update ring dequeue pointer */
> > > +		while (ep_ring->dequeue != td->last_trb)
> > > +			inc_deq(xhci, ep_ring, false);
> > > +		inc_deq(xhci, ep_ring, false);
> > >  		return finish_td(xhci, td, event_trb, event, ep, status, true);
> > >  	}
> > 
> > As I said, this bit has been sent off to Greg.
> 
> Will drop it.
> 
> > > @@ -1839,7 +1850,6 @@ static int handle_tx_event(struct xhci_hcd *xhci,
> > >  				xhci_dbg(xhci, "td_list is empty while skip "
> > >  						"flag set. Clear skip flag.\n");
> > >  			}
> > > -			ret = 0;
> > >  			goto cleanup;
> > >  		}
> > 
> > This bit should be in a separate patch.
> 
> Will do.
> 
> > 
> > > @@ -1874,20 +1884,23 @@ static int handle_tx_event(struct xhci_hcd *xhci,
> > >  						"Skip it\n");
> > >  				goto cleanup;
> > >  			}
> > > -		}
> > >
> > > -		/* Now update the urb's actual_length and give back to
> > > -		 * the core
> > > -		 */
> > > -		if (usb_endpoint_xfer_control(&td->urb->ep->desc))
> > > -			ret = process_ctrl_td(xhci, td, event_trb, event, ep,
> > > -						 &status);
> > > -		else if (usb_endpoint_xfer_isoc(&td->urb->ep->desc))
> > > -			ret = process_isoc_td(xhci, td, event_trb, event, ep,
> > > -						 &status);
> > > -		else
> > > -			ret = process_bulk_intr_td(xhci, td, event_trb, event,
> > > -						 ep, &status);
> > > +			/* Now update the urb's actual_length and give back to
> > > +			 * the core
> > > +			 */
> > > +			if (usb_endpoint_xfer_control(&td->urb->ep->desc))
> > > +				ret = process_ctrl_td(xhci, td, event_trb, event, ep,
> > > +							 &status);
> > > +			else if (usb_endpoint_xfer_isoc(&td->urb->ep->desc))
> > > +				ret = process_isoc_td(xhci, td, event_trb, event, ep,
> > > +							 &status);
> > > +			else
> > > +				ret = process_bulk_intr_td(xhci, td, event_trb, event,
> > > +							 ep, &status);
> > > +		} else {
> > > +			xhci_dbg(xhci, "event_seg is NULL\n");
> > > +			return -ENODEV;
> > > +		}
> > 
> > I assume you did this part of the patch because handle_tx_event() was
> > infinite looping.  I don't think it's a very good fix, and I'd rather
> > know the underlying cause for this.
> > 
> > The fix looks a bit...odd.  The only way the else statement would run is
> > if this statement is false:
> > 	if (!event_seg &&
> >                    (!ep->skip ||
> > 		   !usb_endpoint_xfer_isoc(&td->urb->ep->desc))) {
> > 
> > or
> > 	if (!(!event_seg && (!ep->skip || !usb_endpoint_xfer_isoc(&td->urb->ep->desc))))
> > 	if ((event_seg || !(!ep->skip || !usb_endpoint_xfer_isoc(&td->urb->ep->desc))))
> > 	if ((event_seg || (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc))))
> > 
> > We know the event_seg is NULL, so the skip flag was set and this was an
> > isochronous endpoint.  I wonder if the event_seg was NULL because of the
> > lack of setting the dequeue pointer while processing the skipped TDs.
> > 
> > Does the infinite looping go away when you apply *just* the patch to fix
> > updating the dequeue pointer?  What about the issues with triggering the
> > BUG() calls while setting the TR dequeue pointer?
> 
> No, Andiry's fix for the dequeue pointer didn't fix either of those.
> 
> -- 
> Paul
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux