On 16.1.2023 18.59, Ladislav Michl wrote:
Hi Mathias,
On Mon, Jan 16, 2023 at 04:22:11PM +0200, Mathias Nyman wrote:
From: Jimmy Hu <hhhuuu@xxxxxxxxxx>
When the host controller is not responding, all URBs queued to all
endpoints need to be killed. This can cause a kernel panic if we
dereference an invalid endpoint.
Fix this by using xhci_get_virt_ep() helper to find the endpoint and
checking if the endpoint is valid before dereferencing it.
I'm a bit confused this goes in and even to stable. Let me quote your
own analysis from
Message-ID: <0fe978ed-8269-9774-1c40-f8a98c17e838@xxxxxxxxxxxxxxx>
On Thu, Dec 22, 2022 at 03:18:53PM +0200, Mathias Nyman wrote:
I think root cause is that freeing xhci->devs[i] and including rings isn't
protected by the lock, this happens in xhci_free_virt_device() called by
xhci_free_dev(), which in turn may be called by usbcore at any time
So xhci->devs[i] might just suddenly disappear
Patch just checks more often if xhci->devs[i] is valid, between every endpoint.
So the race between xhci_free_virt_device() and xhci_kill_endpoint_urbs()
doesn't trigger null pointer deref as easily.>
I believe the above is correct and even Jimmy was unable to verify your
later patch (3rd in this serie), which brings a question how could be this
patch tested. It just burns a bug a bit deeper and I do not think it is the
right approach.
As I said in a direct response to the original patch I think this is a valid fix
for older kernels where we used to unlock xhci->lock while giving back
URBs. Together with PATCH 3/7 the issue should be completely resolved.
For later kernels PATCH 3/7 should be enough by itself, but no harm in keeping this.
See Message-ID: <379b395f-b65c-96fe-7ecc-f18e3740b990@xxxxxxxxxxxxxxx>
Older kernels are all before v5.5 that lack commit
36dc01657b49 usb: host: xhci: Support running urb giveback in tasklet context.
I haven't been able to trigger this issue myself, but based on the report and finding in
the code I still think this the right approach. The internal testing this has been
through could only prove these patches (2/7 and 3/7) don't cause any additional issues.
If you think the analysis or solution is incorrect let me know, and we can come up with a
better one.
Thanks
Mathias