On Monday, 25 of February 2008, Alan Stern wrote: > Ongoing efforts to remove the freezer from the system suspend and > hibernation code ("system sleep" is the proper catch-all term) have > turned up a fundamental flaw in the Power Management subsystem's > design. In brief, we cannot handle the race between hotplug addition > of new devices and suspending all existing devices. > > It's not a simple problem (and I'm going to leave out a lot of details > here). For a comparison, think about what happens when a device is > hot-unplugged. When device_del() calls the driver's remove method, the > driver is expected to manage all the details of synchronizing with > other threads that may be trying to add new child devices as well as > removing all existing children. > > But when a system sleep begins, the PM core is expected to suspend > all the children of a device before calling the device driver's suspend > method. If there are other threads trying to add new children at the > same time, it's the PM core's responsibility to synchronize with > them -- an impossible job, since only the device's driver knows what > those other threads are and how to stop them safely. It's not a problem if new children are registered before the parent's ->suspend() is called, the PM core can handle that. The problem is the potential race between the suspending task and the threads registering new children concurrently to the executing ->suspend(), because if those threads lose the race, the resume ordering will be broken. Since the PM core knows nothing about the drivers internals, the drivers' ->suspend() methods must be responsible for synchronizing with the other threads used by the driver. > In the past this deficiency has been hidden by the freezer. Other > tasks couldn't register new children because they were frozen. But now > we are phasing out the freezer (already most kernel threads are not > freezable) and the problem is starting to show up. > > A change to the PM core present in 2.6.25-rc2 (but which is about to be > reverted!) has the core try to prevent these additions by acquiring the > device semaphores for every registered device. This has turned out to > be too heavy-handed; for example, it prevents drivers from > unregistering devices during a system sleep. There are more subtle > synchronization problems as well. I think we just attempted to take device semaphores too early. We probably can take the device semaphores _after_ suspending all devices without much hassle. However, it's not actually a problem if a suspended device gets unregistered - it's removed from the list on which it is at the moment and won't be resumed. It also is not a problem if the device is registered after it's master's ->resume() has run. Besides, taking the semaphores for all _existing_ devices doesn't prevent new devices from being added and that's we needed to take pm_sleep_rwsem in device_add(). > The only possible solution is to have the drivers themselves be > responsible for preventing calls to device_add() or device_register() > during a system sleep. (It's also necessary to prevent driver binding, > but this isn't a major issue.) The most straightforward approach is to > add a new pair of driver methods: one to disable adding children and > one to re-enable it. Of course this would represent a significant > addition to the Power Management driver interface. I'd rather not do that. > (Note that the existing suspend and resume methods cannot be used for > this purpose. Drivers assume that when the suspend method is called, > it has already been called for all the child devices. This wouldn't be > true if one of the purposes of the method was to prevent addition of > new children.) > > Another way of accomplishing this is to require drivers to pay > attention to pm_notifier chain and stop registering children when any > of the PM_xxx_PREPARE messages is sent. This approach feels a lot more > awkward to me. As I said above, I don't see a problem with registering new devices before the parent's ->suspend() is run and after it's ->resume() has run. That doesn't break any ordering rules and we can handle it. Thanks, Rafael _______________________________________________ linux-pm mailing list linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/linux-pm