Re: Hitting "unused qh not empty" BUG in qh_destroy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 15 Sep 2014 11:59:15 -0400
Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:

> On Sat, 13 Sep 2014, Joe Lawrence wrote:
> 
> > Hi Alan,
> > 
> > I've collected 16 crashes since kicking off automated tests a little
> > over 24 hrs ago.
> > 
> > Each crash hit the BUG in qh_destroy() and the only unique debugging
> > printk from ehci_stop() was: "ehci_stop: ehci->num_async = 0".
> 
> What about error messages from the check_async_ring routine?

In the crash instances, the error messages never occurred.  (I had
added a BUG() in that code block, just to be really clear if/when that
event occurred).

> You can check for this.  Sprinkle ehci_info messages throughout 
> ehci_stop, printing the value of ehci->async->qh_next.qh.  It should be 
> NULL the entire time.

I re-ran the tests with the following changes:

  - moved num_async debug up above struct ehci_hcd.priv[0] :)
  - added verbose debug msg in check_async_ring to display the new
    num_async value
  - added ehci_info to every other line in ehci_stop

% grep '0000:66:00.1' vmcore-dmesg.txt

[ 4936.956739] pci 0000:66:00.1: [8086:1d2d] type 00 class 0x0c0320
[ 4936.956789] pci 0000:66:00.1: reg 0x10: [mem 0xa0000400-0xa00007ff]
[ 4936.957019] pci 0000:66:00.1: PME# supported from D0 D3hot D3cold
[ 4939.564933] ehci-pci 0000:66:00.1: EHCI Host Controller
[ 4939.565462] ehci-pci 0000:66:00.1: new USB bus registered, assigned bus number 4
[ 4939.565498] ehci-pci 0000:66:00.1: debug port 2
[ 4939.569466] ehci-pci 0000:66:00.1: cache line size of 64 is not supported
[ 4939.569533] ehci-pci 0000:66:00.1: irq 13, io mem 0xa0000400
[ 4939.574794] ehci-pci 0000:66:00.1: USB 2.0 started, EHCI 1.00
[ 4939.575410] usb usb4: SerialNumber: 0000:66:00.1
[ 4940.089044] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4941.795700] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
[ 4941.895713] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4941.943761] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
[ 4942.002715] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4942.064346] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
[ 4942.156646] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4942.226028] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
[ 4942.274185] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4942.329196] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
[ 4942.391055] ehci-pci 0000:66:00.1: check_async_ring:970 ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4942.629653] ehci-pci 0000:66:00.1: HC died; cleaning up
[ 4945.663153] ehci-pci 0000:66:00.1: remove, state 1
[ 4945.733934] ehci-pci 0000:66:00.1: ehci_stop:420 ehci(ffff8808419839d0)->num_async = 1
[ 4945.774242] ehci-pci 0000:66:00.1: ehci_stop:427 ehci(ffff8808419839d0)->num_async = 1
[ 4945.814791] ehci-pci 0000:66:00.1: ehci_stop:429 ehci(ffff8808419839d0)->num_async = 1
[ 4945.855938] ehci-pci 0000:66:00.1: ehci_stop:431 ehci(ffff8808419839d0)->num_async = 1
[ 4945.897145] ehci-pci 0000:66:00.1: ehci_stop:433 ehci(ffff8808419839d0)->num_async = 1
[ 4945.938791] ehci-pci 0000:66:00.1: ehci_stop:436 ehci(ffff8808419839d0)->num_async = 1
[ 4945.980491] ehci-pci 0000:66:00.1: ehci_stop:438 ehci(ffff8808419839d0)->num_async = 1
[ 4946.022221] ehci-pci 0000:66:00.1: ehci_stop:440 ehci(ffff8808419839d0)->num_async = 1
[ 4946.063936] ehci-pci 0000:66:00.1: ehci_stop:444 ehci(ffff8808419839d0)->num_async = 1
[ 4946.105631] ehci-pci 0000:66:00.1: ehci_stop:446 ehci(ffff8808419839d0)->num_async = 1
[ 4946.241802] ehci-pci 0000:66:00.1: ehci_stop:448 ehci(ffff8808419839d0)->num_async = 1

where ehci_stop line 420 is function entry and line 448 function
return.  Notice that check_async_ring incremented num_async just before
the HC died/removed and then num_async was 1 throughout ehci_stop.

qh_destroy then BUGs with a struct ehci_hcd *ehci of ffff8808419839d0.

-- Joe

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux