[linux-pm] Toward runtime power management in Linux

stern at rowland.harvard.edu (Alan Stern) · Thu Aug 25 08:27:46 2005

On Thu, 25 Aug 2005, Benjamin Herrenschmidt wrote:

> > Recursion: A consequence of doing things this way is that the
> > notifications can potentially use a lot of stack space as they
> > progress up the device tree.  (I can't think of any simple
> > non-recursive technique for implementing the scheme.)  Fortunately
> > this probably won't be too bad; the notifications will stop when they
> > reach a device that doesn't want to change its state (because it has
> > other children).  So the recursion should not involve too many levels.
> > Still, it is something to watch out for.
> 
> I think the locking is unmanageable if we do synchronous notification. I
> think devices should issue a "request for link state change" and be
> asynchronously notified of the actual change, though they might be
> allowed to block if they are doing that from process and/or probe()
> context (most of the time). I think the actual tree walking should be
> done by a separate thread. That simplifies locking tremendously by
> suppressing most deadlock conditions.

But it complicates the structure of drivers by forcing them to use a
state-machine approach with a bunch of tedious intermediate states ("link
change request issued, waiting for notification").  It also introduces
complications from the possible arrival of several change requests or
notifications at once, since there's no locking to guarantee mutual
exclusion among them.

> Another problem is the whole racyness of tree & list walking vs.
> add/remove. We try to work around it in various ways that I think can't
> work. See the small thread about pci_walk_bus() on lkml that exposes
> that kind of races vs. device removal. I think we really need to
> indroduce proper iterator objects that get notified on removal.
> 
> Racing with device addition is a different issue altogether. It means we
> may add devices to already-walked part of the tree, thus have
> inconsistent states... unless drivers are made properly aware that the
> link state may not be full-on at probe time and deal with that.

I think these races can be handled without too much trouble.  Consider the 
two cases you mentioned.

	Device addition racing with parent suspend: adding a device to
	an already-suspended parent.  The parent's driver should be
	written in such a way that it hold's the parent's PM lock
	while determining that a new child has been found.  If the parent
	is capable of detecting a new child while suspended, then it
	should also be capable of adding the child device.  If it's not
	capable of detecting new children while suspended then the issue
	doesn't arise.

	Device removal racing with power state changes.  During device
	removal the PM lock should be held; this will prevent the two
	operations from overlapping.  You might end up trying to carry
	out a power-state change on an already-removed device; that's
	not a problem -- it will simply fail.

On Thu, 25 Aug 2005, Brown, Len wrote:

> >I think the locking is unmanageable if we do synchronous
> >notification.
>
> Hmmm, good point.
> But how frequent do we expect walks and add/remove to be?
> If they almost never happen then it sounds like over-design
> to have more than a single lock.

RTPM changes won't be terribly frequent.  But add/remove, while maybe not
very frequent in an average sense, do occur in clusters.  The lock you're
proposing would end up being a single chokepoint for all devices.  That's
exactly the sort of thing removal of the bus subsystem's rwsem was
intended to eliminate.

Alan Stern