Re: URB IRQ fires on URB after usb_kill_urb() already completed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 25 Feb 2015, Devin Heitmueller wrote:

> Hi Alan,
> 
> I think I see what's going on.  Permit me to comment on your
> explanation of urb->use_count first, since it's relevant later on.

I won't go over this in great detail, because I think your proposed 
explanation is wrong.  My impression is that we aren't looking in the 
right place.

> 1.  A task calls usb_kill_urb() against a URB with use count of 1 (or
> sometimes 2).
> 2.  usb_kill_urb() increments urb->reject so any attempt to resubmit
> the URB fails
> 3.  usb_kill_urb() calls usb_hcd_unlink_urb()
> ***
> 4.  usb_kill_urb() spins on &usb_kill_urb_queue waiting for use_count
> to reach zero.
> 5.  usb_kill_urb() returns to the caller, which typically then calls
> usb_free_urb(urb).
> 
> Between steps 3 and 4, the following things occur:
> 3a.  usb_hcd_unlink_urb() sees urb->use_count is nonzero and calls
> unlink1() against the URB
> 3b.  unlink1() calls ehci_urb_dequeue(), which calls
> usb_hcd_check_unlink_urb, which sets the urb->unlinked field to
> -ENOENT
> 3c.  ehci_urb_dequeue() calls start_unlink_async() for the QH tied to the URB
> 
> Sometime while usb_kill_urb() is spinning on urb->use_count, the IRQ
> handler eventually fires:
> 3d.  ehci_irq() calls end_unlink_async()
> 3e.  end_unlink_async() calls qh_completions()
> 3f.  qh_completions() calls ehci_urb_done()
> 3g.  ehci_urb_done() calls __usb_hcd_giveback_urb
> 3h.   __usb_hcd_giveback_urb calls the completion handler for the URB
> 3i. Because the URB isn't resubmitted, use_count drops to zero and
> wake_up(&usb_kill_urb_queue) is called.
> 
> ===
> 
> For the failure case, I see the completion handler firing for the URB,
> but the URB isn't being resubmitted because of the flag in the
> completion handler.  Then when the kill comes around, the
> urb->use_count is zero and thus it returns immediately rather than
> blocking on the wait queue.  But this doesn't change the fact that the
> end_unlink_async() hasn't necessarily been invoked yet for the URB.
> When the end_unlink_async() does finally do its housekeeping and call
> ehci_urb_done(), it's being called against a URB that has already been
> freed.

This misses a crucial point: the completion handler is called 
(indirectly) by end_unlink_async().  Therefore end_unlink_async() _has_ 
necessarily already been invoked by the time usb_kill_urb() runs.

> Below is a trace which shows the crash, and how during the completion
> handler the use count is good for URB ebc54410, but because the URB
> won't be resubmitted it drops to zero by the time the usb_kill_urb()
> is called (note, URB ebc54410 is the second to last URB killed before
> the crash).  You can also see at the time the completion handler is
> called for URB ebc54410 that while I've set the flag that the
> completion handler looks for to return without resubmitting, I haven't
> actually issued the usb_kill_urb() call, which is why the status is 0
> instead of -2.

The log doesn't show submissions, so I'll take your word for it that
the ebc54410 URB was never resubmitted.

Then here's what we see: The URB is given back and the completion
handler doesn't resubmit it.  Sometime later we crash because
usb_hcd_unlink_urb_from_ep() was called for that URB (which was freed
in the meantime).

> There's clearly a reference counting problem if the IRQ handler is
> still referring to qh->urb entries in its list, but the underlying URB
> is no longer in a submitted state.  That said, I don't really know
> enough about the internals of the EHCI controller model to know what
> the right fix is.

It may not be a reference counting problem at all.  The scenario I 
outlined above should not be possible because after the URB is given 
back, ehci-hcd should not have any pointers to it in any data 
structures.

This is something you may be able to check.  First, a brief explanation 
of how things are supposed to work:

qh_completions() primarily consists of a big loop:

	list_for_each_safe (entry, tmp, &qh->qtd_list) {
		struct ehci_qtd	*qtd;
		struct urb	*urb;
		u32		token = 0;

		qtd = list_entry (entry, struct ehci_qtd, qtd_list);
		urb = qtd->urb;
		...

This iterates through the qTDs on qh->qtd_list, and it uses the URB
pointer stored in the qTD.

The code calls ehci_urb_done() whenever it reaches a boundary between
two URBs (note that each URB can have multiple qTDs).  Thus, near the
top of the loop it checks to see if the current URB is different from
the one on the last iteration, and it does something similar just after
the loop ends.

When ehci_urb_done() is called, _all_ the qTDs for that URB are 
supposed to have been removed from qh->qtd_list.  There should be 
nothing left that points to the URB being given back.  The only way to 
get another qTD pointing to that URB is if the URB is resubmitted.

Evidently this invariant isn't holding in your situation.  You can test
for it easily enough.  Just before the two places where
qh_completions() calls ehci_urb_done(), loop through all the qTDs on
qh->qtd_list and see if any of them has qtd->urb equal to last->urb.  
If this happens, it's a bug.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux