[Moving this conversation over to the linux-usb mailing list, so I can get confirmation of the unbind behavior.] On Fri, Mar 02, 2012 at 06:24:56PM +0000, Colin Ian King wrote: > Hi again Sarah, > > On 28/02/12 18:13, Sarah Sharp wrote: > >On Mon, Feb 27, 2012 at 11:39:34AM +0000, Colin Ian King wrote: > >>Hi Sarah, > >> > >>We've seen some issues with suspend locking up on Lenovo U300 > >>laptops which can be worked around by unbinding the devices > >>associated with xhci. > >> > >>The device being unbinded is: > >> > >>03:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host > >>Controller (rev 04) > >> > >>The curious issue is that the suspend hard locks once we put the > >>machine into the suspend S3 state after we have written SLP_TYP + > >>SLP_EN to the PM1 control registers in acpi_enter_sleep_state(), so > >>at this point the kernel has put the machine into suspend and the > >>kernel at that point has no further control of the machine. > >> > >>See: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904261 > >> > >>Any ideas why unbinding will make S3 work correctly? > > > >If the xHCI driver is bound to the PCI device during suspend, it will > >save the xHCI register state, and ask the host controller to save any > >context in the scratchpad pages allocated for the host. On resume, it > >will restore any registers, and ask the host to do a context restore. > >If that fails (e.g. because the host was used by the BIOS during > >resume), then we treat it as a power loss and completely reinitialize > >the host controller. > > > >We also never give the host controller back to the BIOS on system > >suspend. Perhaps the BIOS is spinning, waiting to get control of the > >xHCI host controller through the BIOS/OS semaphore in the xHCI extended > >register space? > > > >It's possible that we're triggering a BIOS bug because we're doing > >something the BIOS doesn't expect. Maybe the Windows driver for that > >machine always unbinds the xHCI driver and gives the BIOS control of the > >host? > > > >I think this is specific to this laptop or BIOS, because I have a Lenovo > >x220 with the same NEC host controller (and same revision), and it > >suspends and resumes just fine with no driver modifications. The host > >controller does signal a context restore failure on every resume, but > >the xHCI host then reinitializes successfully. Since my laptop's NEC > >host works with suspend, it seems likely to be specific to that laptop > >or BIOS. I will try with the same kernel the user ran with to double > >check (I'm currently running 3.2.0). > > > >Did the original bug reporter happen to have > >CONFIG_USB_XHCI_HCD_DEBUGGING turned on when they tried to suspend? The > >xHCI debugging polling loop may have tried to touch the PCI register > >space in the late suspend process, which could be an issue. Not sure if > >that would cause the hard-hang, but it is a possibility. > > We experimented with this off and on, it makes no difference, but we > at least factored this out. > > I'm in a catch-22 position at the moment as Lenovo has provided a > BIOS image but the user won't test this because he's afraid of > bricking his new shiny Lenovo u300. I would really rather they try to update their BIOS, but I understand not wanting to brick their new laptop. However, it's pretty likely Lenovo would fix it if they did. They could run duplicity to make a backup of the drive, and then if Lenovo replaced it, they could just re-install Ubuntu and use duplicity to get all their data and installed programs back. I use something like: sudo duplicity --exclude /proc --exclude /sys --exclude /mnt --exclude /selinux --exclude /lost+found --exclude /dev --exclude /media --exclude /tmp --exclude /var/cache/apt / file:///media/disk/backups It worked great for me when I wanted to switch my debian-based laptop to Ubuntu. I just excluded the package directories, saved everything else, installed a fresh version of Ubuntu, and overwrote all the user data. > However, the user is using a > workaround as described in: http://thecodecentral.com/2011/01/18/fix-ubuntu-10-10-suspendhibernate-not-working-bug > - basically they are unbinding on the device. > > Is this an ugly hack that will bite them later or is it something we > can recommend users doing to workaround this issue (if they don't > want to do a BIOS upgrade). It's an ugly hack, and yes, it will bite them if they don't understand what it does. Forcing an unbind of the xHCI driver is a pretty harsh process. The USB core acts like all USB devices are disconnected, and then calls the xHCI shutdown methods. It's basically equivalent to yanking out all your USB devices on suspend and plugging them back in on resume. This is not good if you have any USB mass storage devices (USB drives) attached to the system on suspend. It's like not using "safely remove" or "eject" before yanking out a USB drive. Sure, 90% of the time just unplugging the drive will be fine, but 10% of the time you have a dirty disk cache, and you're yanking the USB device before it can be written. They could lose data or end up with a corrupted file system. That's why I tear my hair out everyone says, "Life's too short to safely remove." USB mass storage devices will not persist across suspend and resume, and will come up as different drive letters. USB audio devices will also come up as different audio devices, so they'll have to reconfigure their sound preferences on every suspend and resume. The same thing happens with USB video devices. If they're running cheese or uvcvideo when they suspend, the app will say "no such device" on resume, and they will have to switch video inputs to the re-connected device. As long as they realize they need to unmount any USB drives before suspending, they'll be safe, if a little inconvenienced. I would recommend they not set their power preferences to suspend their laptop when the lid is closed. Otherwise they might forget to unmount their USB drives and they would lose data. Other than that, it will just be a little more painful to reconfigure some things after resume. I just don't like those sort of hacks floating around the internet. Everyone says, "Hey, here's the fix" without realizing what they're passing around, or warning anyone about the ramifications of the fix. Then I never get reports of bugs in the xHCI driver, because everyone works around the issue. I would much rather get notified of the bugs, so that I could possibly start to notice a pattern, and fix any software issues. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html