Re: Highly critical bug in XHCI Controller

Markus Rechberger <linuxusb.ml@xxxxxxxxxx> · Sun, 17 Nov 2024 20:44:16 +0800

Basically the issue comes from hub_port_connect.

drivers/usb/core/hub.c

hub_port_init returns -71 -EPROTO and jumps to loop
https://github.com/torvalds/linux/blob/master/drivers/usb/core/hub.c#L5450

I'd question if usb_ep0_reinit is really required in loop which is
running following functions:
    usb_disable_endpoint(udev, 0 + USB_DIR_IN, true);
    usb_disable_endpoint(udev, 0 + USB_DIR_OUT, true);
    usb_enable_endpoint(udev, &udev->ep0, true);

this is something only experience over the past decades can tell?

usb_enable_endpoint will trigger xhci_endpoint_reset which doesn't do
much, but crashes the entire system with the upstream kernel when it
triggers xhci_check_bw_table).

I removed usb_ep0_reinit here and devices are still workable under
various conditions (again I shorted and pulled D+/D- to ground for
testing).
The NULL PTR check in xhci_check_bw_table would be a second line of
defense but as indicated in the first mail it shouldn't even get there.

As a second issue I found in usb_reset_and_verify device 
https://github.com/torvalds/linux/blob/master/drivers/usb/core/hub.c#L6131

        ret = hub_port_init(parent_hub, udev, port1, i, &descriptor);
        if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV) {
            break;
        }

hub_port_init can also return -71 / -EPROTO, the cases should be very
rare when usb_reset_and_verify_device is triggered and that happens.

I'm just waiting for comments now since this is some critical piece of
infrastructure code before proceeding with a patch.

On Sun, 2024-11-17 at 15:33 +0800, Markus Rechberger wrote:
> Hi,
> 
> 
> the issue was first reported at vdr-portal.de
> https://www-vdr--portal-de.translate.goog/forum/index.php?thread/136541-empfehlung-dvb-s2-tuner-oder-satip/&postID=1376196&_x_tr_sl=de&_x_tr_tl=en&_x_tr_hl=de&_x_tr_pto=wapp#post1376196
> 
> we've got around a highly critical bug in the xhci driver.
> 
> https://sundtek.de/support/uxvd32.txt
> 
> In xhci.c
> 
> The bug is still active in Mainline:
> https://github.com/torvalds/linux/blob/master/drivers/usb/host/xhci.c#L2382
> 
> static int xhci_check_bw_table(struct xhci_hcd *xhci,
>         struct xhci_virt_device *virt_dev,
>         int old_active_eps)
> 
> bw_table can end up with a NULL pointer.
> 
> This problem will lead to a complete kernel crash, rendering the
> entire
> system unusable without any access to the actual linux system.
> 
> How to trigger the problem?
> Short D+/D- or pull them to ground on a USB device while connecting
> the
> device.
> 
> The problem can happen due to following cases:
> * a device is getting suddenly disconnected during enumeration
> * a faulty cable
> * a faulty device 
> * a malicious device triggers this issue on purpose
> * if there are electrical issues during connecting a device.
> 
> A quick hotfix would be to check if bw_table is NULL in
> xhci_check_bw_table, however the check should be performed earlier -
> in
> the area where bw_table is supposed to be assigned.
> 
> Best Regards,
> Markus