Re: [GIT PULL] PM updates for 2.6.33

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Mon, 7 Dec 2009 15:37:12 -0500 (EST)

On Mon, 7 Dec 2009, Linus Torvalds wrote:

> See my second email, where I think I can get rid of the whole second pass 
> thing. I think you'll agree that it's an even nicer mirror image.

Yes, I like this approach better and better.

There is still a problem.  In your code outlines, you have presented a
classic depth-first (suspend) or depth-last (resume) tree algorithm.  
But that's not how the PM core works.  Instead it maintains dpm_list, a
list of all devices in order of registration.  Suspends and resumes are
carried out by iterating along this list, in the reverse and forward
directions respectively.

There are two advantages.  The matter of stack usage, of course.  But
more importantly, this order of devices is guaranteed to work.  For any
device D, we _know_ that the system can function properly in
circumstances where everything on dpm_list before D is active and
everything after D is inactive -- because that's the state the system
was in when D was registered.  Any other order risks errors because of
unknown dependencies.

The consequence is that there's no way to hand off an entire subtree to 
an async thread.  And as a result, your single-pass algorithm runs into 
the kind of "stall" problem I described before.

(In theory we could convert over to a tree algorithm.  IMO that would 
be nearly as dangerous as going to a full-fledged totally async 
scheme.)

But all is not lost.  We can still get what we want using a two-pass
list algorithm, where one of the passes is contained within the PM core
-- no extra callbacks are needed.  Here's how suspend would work:

	dpm_suspend()	/* Suspend all devices on dpm_list */
	{
		list_for_each_entry_reverse(dev, dpm_list, ...) {

			/* Make the parent wait for dev */
			down_read(dev->parent->lock);
			if (dev->async_pm)
				async_schedule(device_suspend, dev);
		}
		list_for_each_entry_reverse(dev, dpm_list, ...) {
			if (!dev->async_pm)
				device_suspend(dev);
		}
		async_synchronize_full();
	}

	device_suspend(dev)	/* Suspend a single device */
	{
		/* Wait until all the children are suspended */
		down_write(dev->lock);
		dev->bus->suspend(dev);
		up_write(dev->lock);

		/* Tell the parent we are finished */
		up_read(dev->parent->lock);
	}

I have glossed over a bunch of details, such as the fact that
device_suspend() really takes two arguments.  And it's necessary to be
more careful with the list operations than shown here, because devices
can be unregistered while all this is going on.

Still, this seems reasonable.  Bus subsystems and drivers can set the
dev->async_pm flag as desired, and they can use the new rwsems to
handle special dependencies without involving the PM core.  No new
callbacks are needed, nor any changes to existing methods.  
(Convincing lockdep that all this fancy footwork is valid may require
some effort, though.)

By the way, this bears a striking resemblance to Rafael's patch.  The
biggest difference is the use of the new rwsem for dependency
resolution, instead his somewhat cumbersome constraint structures.

> > There's some question about what to do if a suspend or resume fails.  A
> > bunch of async threads will have been launched for other devices, but
> > now there won't be anything to wait for them.  It's not clear how this
> > should be handled.
> 
> I think the rule for "suspend fails" is very simple: you can't fail in the 
> async codepath. There's no sane way to return errors, and trying to would 
> be too complex anyway. What would you do? 

You could prevent the suspend procedure from going any further and
abort the entire system sleep.  If you wanted to.

> In fact, even though we _can_ fail in the synchronous path, I personally 
> consider a device driver that ever fails its suspend to be terminally 
> broken. We're practically always better off suspending and simply turning 
> off the power than saying "uh, I failed the suspend".
> 
> I've occasionally hit a few drivers that caused suspend failures, and each 
> and every time it was a driver bug, and the right thing to do was to just 
> ignore the error and suspend anyway - returning an error code and trying 
> to undo the suspend is not what anybody ever really wants, even if our 
> model _allows_ for it.

There is a valid reason for aborting a sleep transition: the driver has
received a remote wakeup request.  Wakeup requests race with sleep, of
course.  A request coming after the system is asleep will wake it up;
one coming before the system is asleep should either cause it to wake
up immediately after shutting down or prevent the sleep entirely.

Causing the system to wake up immediately needs hardware support.  But
by the time the kernel is aware of a wakeup request, the request is
generally no longer present in the hardware.  (For example, an
interrupt has been delivered and the IRQ line is no longer active.)  
So the only remaining choice is to abort the sleep transition.

> (And the rule for "resume fails" is even simpler: there's nothing we can 
> really do if something fails to resume - and that's true whether the 
> failure is synchronous or asynchronous. The device is dead. Try to reset 
> it, or remove it from the device tree. Tough).

Right.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html