Re: Hitting "unused qh not empty" BUG in qh_destroy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 16 Sep 2014, Joe Lawrence wrote:

> > You can check for this.  Sprinkle ehci_info messages throughout 
> > ehci_stop, printing the value of ehci->async->qh_next.qh.  It should be 
> > NULL the entire time.
> 
> I re-ran the tests with the following changes:
> 
>   - moved num_async debug up above struct ehci_hcd.priv[0] :)
>   - added verbose debug msg in check_async_ring to display the new
>     num_async value
>   - added ehci_info to every other line in ehci_stop
> 
> % grep '0000:66:00.1' vmcore-dmesg.txt
> 
> [ 4936.956739] pci 0000:66:00.1: [8086:1d2d] type 00 class 0x0c0320
> [ 4936.956789] pci 0000:66:00.1: reg 0x10: [mem 0xa0000400-0xa00007ff]
> [ 4936.957019] pci 0000:66:00.1: PME# supported from D0 D3hot D3cold
> [ 4939.564933] ehci-pci 0000:66:00.1: EHCI Host Controller
> [ 4939.565462] ehci-pci 0000:66:00.1: new USB bus registered, assigned bus number 4
> [ 4939.565498] ehci-pci 0000:66:00.1: debug port 2
> [ 4939.569466] ehci-pci 0000:66:00.1: cache line size of 64 is not supported
> [ 4939.569533] ehci-pci 0000:66:00.1: irq 13, io mem 0xa0000400
> [ 4939.574794] ehci-pci 0000:66:00.1: USB 2.0 started, EHCI 1.00
> [ 4939.575410] usb usb4: SerialNumber: 0000:66:00.1
> [ 4940.089044] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4941.795700] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4941.895713] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4941.943761] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4942.002715] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4942.064346] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4942.156646] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4942.226028] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4942.274185] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4942.329196] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4942.391055] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4942.629653] ehci-pci 0000:66:00.1: HC died; cleaning up
> [ 4945.663153] ehci-pci 0000:66:00.1: remove, state 1
> [ 4945.733934] ehci-pci 0000:66:00.1: ehci_stop:420 ehci(ffff8808419839d0)->num_async = 1
> [ 4945.774242] ehci-pci 0000:66:00.1: ehci_stop:427 ehci(ffff8808419839d0)->num_async = 1
> [ 4945.814791] ehci-pci 0000:66:00.1: ehci_stop:429 ehci(ffff8808419839d0)->num_async = 1
> [ 4945.855938] ehci-pci 0000:66:00.1: ehci_stop:431 ehci(ffff8808419839d0)->num_async = 1
> [ 4945.897145] ehci-pci 0000:66:00.1: ehci_stop:433 ehci(ffff8808419839d0)->num_async = 1
> [ 4945.938791] ehci-pci 0000:66:00.1: ehci_stop:436 ehci(ffff8808419839d0)->num_async = 1
> [ 4945.980491] ehci-pci 0000:66:00.1: ehci_stop:438 ehci(ffff8808419839d0)->num_async = 1
> [ 4946.022221] ehci-pci 0000:66:00.1: ehci_stop:440 ehci(ffff8808419839d0)->num_async = 1
> [ 4946.063936] ehci-pci 0000:66:00.1: ehci_stop:444 ehci(ffff8808419839d0)->num_async = 1
> [ 4946.105631] ehci-pci 0000:66:00.1: ehci_stop:446 ehci(ffff8808419839d0)->num_async = 1
> [ 4946.241802] ehci-pci 0000:66:00.1: ehci_stop:448 ehci(ffff8808419839d0)->num_async = 1
> 
> where ehci_stop line 420 is function entry and line 448 function
> return.  Notice that check_async_ring incremented num_async just before
> the HC died/removed and then num_async was 1 throughout ehci_stop.
> 
> qh_destroy then BUGs with a struct ehci_hcd *ehci of ffff8808419839d0.

This says that ehci->num_async ended up equal to 1.  But in your 
previous email, you said the final printout indicated it was equal to 
0.  Did something change in the meantime?

Anyway, the log above means that a QH was linked before the HC died,
but then it was never unlinked.  Please add a line at the start of
ehci_endpoint_disable, printing out the value of ep->hcpriv for
comparison with the values listed above.

Also, let's see the contents of *qh and *(qh->hw) for the left-over QH
at *(ehci->async->qh_next.qh).  You can get those right away from your 
crash dump data, right?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux