Am 15.05.2017 um 20:56 schrieb Alan Stern: >> Interesting in this case is that we see a "USB disconnect" message >> for device number 3. And even more strange are the last 3 lines >> that show that new low-speed SUB devices are found even after all >> USB controllers are shutdown. > > The shutdown routine for ohci-hcd turns off all of the controller's > autonomous functionality, but it doesn't stop the kernel from polling > the controller for port-status changes. It seems likely that these > status changes are what give rise to those "new device" messages. Ok, understood. >> We also attached an USB analyzer to the system to see what is going on. >> In the "bad" case we actually see a "resume" on the USB bus when the >> machine is shutdown. Problem is that we cannot see *who* initiated this >> resume, but my own guess is that it comes from the host controller and >> not from any HID device. > > The host controller is not supposed to initiate a resume signal unless > the computer tells it to. It's possible that the kernel is doing this > -- but it's also possible that the BIOS is. In fact, I would expect > the BIOS to do this any time it decided to restart the system. Well, when we did the analysis the BIOS developer was involved, its a colleage that is located in the same building at our site. And BIOS says they're innocent. ;-) > (And of course, the resume signal could be coming from an attached > device. However, that wouldn't explain why you don't see the signal > when you run the "good" kernel...) That is why I assumed that it comes from the controller itself, otherwise I couldn't explain why it works in the "good" case. >> - What could be the root cause for this? > > It's very hard to say. I'm inclined to blame the BIOS, but the truth > is that testing and debugging a kernel while it is shutting down (and > afterward!) are quite difficult. Yes, already experienced that. Well at least I capture from serial and the last lines always say: ACPI: Preparing to enter system sleep state S5 PM: Calling mce_syscore_shutdown+0x0/0x10 PM: Calling ledtrig_cpu_syscore_shutdown+0x0/0x20 PM: Calling irq_gc_shutdown+0x0/0x60 PM: Calling i8259A_shutdown+0x0/0x20 PM: Calling cpufreq_suspend+0x0/0x110 reboot: Power down acpi_power_off called So I assume I got everything of interest in my capture file. >> - How can we find out, what further commits have made the situation >> better in 4.11? > > You can always use git bisect to do this. I'll have a look at this. >> Any hints are welcome. > > You should try doing an rmmod (or unbind) of ehci-pci or ohci-pci or > both before shutting down. Maybe the presence or absence of one of the > drivers will matter. (Note that after you rmmod or unbind ohci-pci, a > USB keyboard will become unusable -- you will have to start the > shutdown beforehand or over a network login.) > Also, it would be interesting to know whether the patch below has any > effect. Even if that effect is just to change the log messages you > record with the good or bad kernel. > > Index: usb-4.x/drivers/usb/core/driver.c > =================================================================== > --- usb-4.x.orig/drivers/usb/core/driver.c > +++ usb-4.x/drivers/usb/core/driver.c > @@ -1889,8 +1889,26 @@ int usb_set_usb2_hardware_lpm(struct usb > > #endif /* CONFIG_PM */ > > +/** > + * usb_dev_shutdown - stop using a USB device when the system shuts down > + * @dev: device to stop using > + * > + * Called by the device core at the start of a system shutdown. > + * Don't delay the shutdown by taking any mutexes or changing the > + * device's configuration; just mark its state as NOTATTACHED. > + * This will prevent any more URBs from being submitted. > + */ > +static void usb_dev_shutdown(struct device *dev) > +{ > + struct usb_device *udev; > + > + udev = to_usb_device(dev); > + usb_set_device_state(udev, USB_STATE_NOTATTACHED); > +} > + > struct bus_type usb_bus_type = { > .name = "usb", > .match = usb_device_match, > .uevent = usb_uevent, > + .shutdown = usb_dev_shutdown, > }; Ok. Tried the patch first. Doesn't work with the bad kernel, but the logs sligthly change. Now those devices that didn't have a shutdown callback before now have one, but this does not solve the problem. Next thing I tried was the unbind approach. Since ehci and ohci were compiled into the kernel I tried to unbind every USB device I found under /sys/bus/usb/drivers/, but even with everything gone there the machine doesn't shutdown at the end. Next approach was that I changed the kernel config so that ehci and ohci are modules instead of being compiled into the kernel. Then I booted the "bad" kernel and did rmmod ehci-pci rmmod ehci-hcd The keyboard/mouse still continued to work on my system (which btw is running Ubuntu 16.04 for this tests). But now its getting strange: - if I shutdown the system at this point with "init 0" from a root shell it performs a shutdown, and it turns off! Yeah. - if I shutdown the system at this point by using the shutdown menu from the Ubuntu menu then the shutdown ends up in a kernel panic. Both results are reproducible. "init 0" shuts the system down and keeps it off, shutdown form menu crashes. Since keyboard/mouse are still functional without the ehci stuff I tried to blacklist them by putting a blacklist-ehci.conf file into /etc/modprobe.d/ that had 2 lines: blacklist ehci_pci blacklist ehci_hcd I also rebuild the initrd image, but I really couldn't get rid of those modules, after every new start lsmod still showed the ehci modules despite the blacklist entries. Next step was disabling ehci support in the kernel config. Rebuilding everything and now I have a bad kernel without ehci support that boots up, is able to handle keyboard and mouse and I shutdown the system (even from the menu) its shuts down and keeps off. So now it seems to behave like the "good" kernel. So at least we would have a workaround, but I would really prefer that I can blacklist those modules because then our partner could build his own kernel for the thin client system in the usual way and a "workaround" could be disabling the ehci stuff from loading. Makes me really wonder if something is wrong with the ehci part of the hardware on that machine. Well, we also shipped one system to AMD for further analysis, maybe they can explain this strange behaviour. Thanks a lot for your input, it was really helpful. Best regards Rainer -- Dipl.-Inf. (FH) Rainer Koenig Project Manager Linux Clients FJ EMEIA PR PSO PM&D CCD ENG SW OSS&C Fujitsu Technology Solutions Bürgermeister-Ullrich-Str. 100 86199 Augsburg Germany Telephone: +49-821-804-3321 Telefax: +49-821-804-2131 Mail: mailto:Rainer.Koenig@xxxxxxxxxxxxxx Internet ts.fujtsu.com Company Details ts.fujitsu.com/imprint.html -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html