On Sun, 2024-11-17 at 15:35 +0100, Michał Pecio wrote: > Hi, > > > Basically the issue comes from hub_port_connect. > > > > drivers/usb/core/hub.c > > > > hub_port_init returns -71 -EPROTO and jumps to loop > > https://github.com/torvalds/linux/blob/master/drivers/usb/core/hub.c#L5450 > > > > I'd question if usb_ep0_reinit is really required in loop which is > > running following functions: > > usb_disable_endpoint(udev, 0 + USB_DIR_IN, true); > > usb_disable_endpoint(udev, 0 + USB_DIR_OUT, true); > > usb_enable_endpoint(udev, &udev->ep0, true); > > > > this is something only experience over the past decades can tell? > > > > usb_enable_endpoint will trigger xhci_endpoint_reset which doesn't > > do > > much, but crashes the entire system with the upstream kernel when > > it > > triggers xhci_check_bw_table). > > > > I removed usb_ep0_reinit here and devices are still workable under > > various conditions (again I shorted and pulled D+/D- to ground for > > testing). > > xHCI isn't the only host controller supported by Linux, and > usb_ep0_reinit() predates it. Maybe it's pointless today, maybe > it isn't, but it's not the root cause of your problem anyway. > exactly, but it shouldn't go there anyway. This section of the code is only for 'in case an error already happened'. That's why I'm asking if anyone knows a practical situation where this is really needed - and did it ever help? I'm working with USB across many systems for nearly 2 decades and I never saw any of those fallbacks succeeding. Usually the way to recover a device which had connection issues was to reconnect the device completely. > > The NULL PTR check in xhci_check_bw_table would be a second line > > of defense but as indicated in the first mail it shouldn't even get > > there. > > It's an xHCI bug that BW calculation is attempted on an uninitialized > device and crashes. Looks like a NULL check somewhere is exactly what > is needed, or maybe avoid it completely on EP0 (it's probably no-op). > Yes this would be the second line of defense as indicated in my email before. I'm preparing 2 small patches with comments in the code. One commenting out usb_ep0_reinit and the other one a simple NULL PTR check - but again the code shouldn't be executed at all when bw_table == NULL something already went wrong, the issue should be handled earlier in the code already. bw_table is not setting itself to NULL or not at all. Since it's infrastructure code it's a sensitive part. This issue fully crashed my Asus notebook at least 3 times when working with a USB ethernet adapter and a USB harddisk. Not only faulty hardware is causing that problem also simple hotplug connection problems. > Other similar bug recently: > https://lore.kernel.org/linux-usb/D3CKQQAETH47.1MUO22RTCH2O3@xxxxxxxxx/T/#u > > Yours too should be unique to those Intel Panther Point chipsets. This is exactly the same problem yes. The problem will happen with all systems which use the xhci driver. Best Regards, Markus > > > As a second issue I found in usb_reset_and_verify device > > https://github.com/torvalds/linux/blob/master/drivers/usb/core/hub.c#L6131 > > > > ret = hub_port_init(parent_hub, udev, port1, i, > > &descriptor); > > if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV) { > > break; > > } > > > > hub_port_init can also return -71 / -EPROTO, the cases should be > > very > > rare when usb_reset_and_verify_device is triggered and that > > happens. > > Right, and the intent seems to be to simply retry in this case. > > Regards, > Michal