Re: Hitting "unused qh not empty" BUG in qh_destroy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 12 Sep 2014, Joe Lawrence wrote:

> On Fri, 12 Sep 2014 11:31:46 -0400
> Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> > On Thu, 11 Sep 2014, Joe Lawrence wrote:
> > 
> > > Hi Alan,
> > > 
> > > I've got another USB bug to report that manifests during automated
> > > device removal testing on RHEL7.  This one hits the BUG() inside
> > > qh_destroy:
> > 
> > How reliably can you trigger this bug?
> 
> I have collected a few crashes within a few days, so somewhat
> frequently.
> 
> > >  67 static void qh_destroy(struct ehci_hcd *ehci, struct ehci_qh *qh)
> > >  68 {
> > >  69         /* clean qtds first, and know this is not linked */
> > >  70         if (!list_empty (&qh->qtd_list) || qh->qh_next.ptr) {
> > >  71                 ehci_dbg (ehci, "unused qh not empty!\n");
> > >  72                 BUG ();
> > >  73         }
> > 
> > > and finally a dump of the ehci_qh in question:
> > > 
> > > crash> struct ehci_qh ffff88084b84dc80
> > > struct ehci_qh {
> > >   hw = 0xffff880078d1a000, 
> > 
> > It would be good to see the contents of the ehci_qh_hw structure.  That 
> > would tell us what device and endpoint this QH was for.
> 
> crash> struct ehci_qh_hw 0xffff880078d1a000
> struct ehci_qh_hw {
>   hw_next = 0x78d1a062, 
>   hw_info1 = 0x8000, 

No maxpacket value, device address, or endpoint number, but the QH_HEAD
bit is set.  That happens only with the head of the async ring.  And
indeed, the QH address agrees with ehci->async = 0xffff88084b84dc80 in
your earlier email.

>   hw_info2 = 0x0, 
>   hw_current = 0x0, 
>   hw_qtd_next = 0x1, 
>   hw_alt_next = 0x78d22000, 
>   hw_token = 0x40, 
>   hw_buf = {0x0, 0x0, 0x0, 0x0, 0x0}, 
>   hw_buf_hi = {0x0, 0x0, 0x0, 0x0, 0x0}
> }
> 
> > >   qh_dma = 0x78d1a000, 
> > >   qh_next = {
> > >     qh = 0xffff88084efe6730, 
> > >     itd = 0xffff88084efe6730, 
> > >     sitd = 0xffff88084efe6730, 
> > >     fstn = 0xffff88084efe6730, 
> > >     hw_next = 0xffff88084efe6730, 
> > >     ptr = 0xffff88084efe6730                     << !NULL
> > >   }, 

So there's a leftover qh_next pointer, presumably to a QH that used to
be on the async list but no longer exists.

This means the list pointers got corrupted somehow.  No way at this
point to know just how.  You can add some debugging code to check the
links at the end of qh_link_async (which adds a new QH to the async
list) and single_unlink_async (which removes a QH from the list).

Something like this:

static void check_async_ring(struct ehci_hcd *ehci, int add)
{
	struct ehci_qh *qh;
	int n;

	qh = ehci->async->qh_next.qh;
	n = ehci->num_async += add;
	while (qh && n > 0) {
		qh = qh->qh_next.qh;
		--n;
	}
	if (qh || n != 0)
		ehci_err(ehci, "EHCI async list corrupted: num %d n %d qh %p\n",
				ehci->num_async, n, qh);
}

Add an int num_async field to the ehci_hcd structure, and then add

	check_async_ring(ehci, 1);

at the end of qh_link_async and

	check_async_ring(ehci, -1);

at the end of single_unlink_async.  Then maybe for thoroughness, print 
out the value of ehci->num_async in ehci_stop if it is nonzero.

> > The place to look is ehci_endpoint_disable.  Did that routine get 
> > called for this QH?  Did it hit the default case of the big switch 
> > statement (with its ehci_err statement)?
> 
> Not sure if there is enough residual side-effect data in a crash dump
> to determine if ehci_endpoint_disable executed.  However, the QH that
> qh_destroy was handling did *not* have the exception bit set.  (See the
> first mail for the structure dump.)

Yeah, the head of the ring isn't a "real" QH, so it never gets
disabled.  Whatever it was pointing to must have been unlinked and
disabled at some time, though.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux