Re: Wakeup-events implementation

"Rafael J. Wysocki" <rjw@xxxxxxx> · Tue, 17 Aug 2010 21:21:50 +0200

On Tuesday, August 17, 2010, Alan Stern wrote:
> On Mon, 16 Aug 2010, Arve Hjønnevåg wrote:
> 
> > >> A name is still useful if we allocate multiple handles per device (see
> > >> below for why one handle per device is not enough).
> > >
> > > Are the input queues are the only place where this is needed?
> > >
> > 
> > I don't know if it is the only place that is using more than one
> > wakelock with a timeout per device. There are other drivers that use
> > more than one wakelock, but it is possible they could use a single
> > nested lock instead (with some detail loss in the the stats). I'm not
> > sure every wakelock can be associated with a device though. For
> > instance the mmc core wakelock does not have a corresponding device.
> 
> What other drivers need more than one wakelock per device?
> 
> For that matter, why does the MMC core need wakelocks at all?  I wasn't
> aware that memory cards could generate wakeup events.

Yes, that's interesting, although I _guess_ this is related to the events
generated by inserting/removing a card.

> > > We'll figure out the best approach.  You mentioned that when the
> > > timeouts and pm_relax do get mixed, the timeout value is around 1 or 2
> > > seconds.  For these cases, would it be so terrible just to use the
> > > timeout alone and forget about pm_relax?
> > >
> > 
> > The timeout used when mixing timeouts and wake_unlock are generally
> > longer than when only using only timeouts since the timeouts are used
> > to cover edge cases were wake_unlock did not get called. Allowing
> > timeouts and pm_relax to be mixed may allow a gradual path from using
> > timeouts to block suspend only when needed (timeouts only -> timeouts
> > for edge cases -> no timeouts).
> 
> In fact, I rather expected you would say that the combined
> cancellations/timeouts were useful for pinpointing applications that
> don't read their input events quickly enough.  If the wakelocks were
> always allowed to time out instead of being cancelled when the data was
> read, you wouldn't be able to tell whether the data was read in a
> timely manner or not.
> 
> > We have also had bugs, where sensors generate input events whenever
> > the system is not suspended. The 5 second timeout we have on input
> > events is not short enough to allow the system to suspend at all in
> > this case.
> 
> This sounds like the sort of bug that would be found very quickly and 
> easily if you didn't have active cancellation.  It would be pretty 
> obvious when the system didn't suspend at all.
> 
> > I see the kernel as trying to suspend if user space asked it to
> > suspend (until it user space asks the kernel to stop suspending).
> > Wakeup events that don't reach user space should not need user space
> > support to reenter suspend.
> 
> That's debatable.  For example, consider a Wake-On-LAN magic packet.  
> It generally doesn't go anywhere -- it gets dropped once the system is 
> awake.

Yes.  I think every event that would wake up a sleeping system should result
in unfreezing user space.

> > You want each suspend attempt to be initiated by user space. Suspend
> > is initiated from user space in both cases,

Not really.  With wakelocks suspend may be initiated as a result of a driver
dropping its wakelock, which doesn't qualify as "from user space" to me.

> > the difference is that you want the suspend request to be one-shot, which
> > means that the kernel may cancel the user space suspend request, where I
> > want user space to explicitly cancel the request.
> 
> Yes, that's a fair way to describe the situation.

I somewhat agree.

> > > Do you have any examples where two subsystems use the same device but
> > > with differing expectations?  In the examples I can think of, the
> > > non-nesting call is used only when a wakeup request is received (which
> > > happens only when the device is at low power), so the nesting calls
> > > can't interfere.
> > >
> > 
> > The wakeup request need to block suspend until the driver has seen the
> > event. If the driver and the wakeup code uses the same handle how can
> > you avoid interference?
> 
> Let's say the interrupt handlers responsible for invoking
> pm_request_resume use non-nesting calls.  This may happen several
> times, depending on the wakeup and interrupt delivery mechanism, but
> the end result is that the counter is incremented only once.  When the
> pm_runtime_resume call ends, pm_relax does a single decrement.
> 
> Meanwhile, the driver uses nesting calls.  It can increment the counter
> as much as it wants, and it has to perform an equal number of
> decrements.  The key point is that pm_request_resume never gets called
> while the driver has any outstanding pm_stay_awake requests.  Hence the 
> non-nesting calls never got "lost", since the first one is always made 
> while the counter is 0.
> 
> > > We may end up attaching these wakeup structures not just to devices but
> > > also to input queues (or other kernel->userspace communication
> > > channels).
> > >
> > 
> > That is we do now (with wakelocks), but I think we need to pass this
> > handle to the pm_stay_wake... calls, and not the device, for this to
> > work.
> 
> You may be right.  Or we may need additional calls taking the handle as
> an argument rather than the device.  The design requirements are still
> simmering in my mind, and I'm waiting to hear what Rafael thinks.

I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev)
calling something like __pm_stay_awake(handle), where the handle is read from
struct dev_pm_info.  And analogously for pm_relax().

Then, the object associated with the handle might be created when the device
is enabled to wake up and destroyed when it's disabled to do that.

OTOH, if it's really necessary to use a "raw" handle (ie. without struct
device), the caller of __pm_stay_awake() will allocate and free the
"wakelock" object by itself as necessary.

As far as the names are concerned, I'd like to avoid allocating space for them
whenever it is not necessary (like when the "wakelock" is pointed to by struct
dev_pm_info, in which case it is simply sufficient to use the device name to
refer to it).

> > > Here's an example where nested pm_stay_awake calls are needed.
> > > Suppose a USB root hub is enabled for wakeup (in other words, plugging
> > > or unplugging a USB device into the computer should wake the system
> > > up).  Suppose the root hub is at low power (autosuspended) when a
> > > wakeup event occurs.  The interrupt handler calls pm_stay_awake and
> > > indirectly invokes the USB runtime-resume routine.  This routine
> > > brings the root hub back to full power, informs the hub driver, and
> > > then calls pm_relax.
> > >
> > > But the hub driver carries out its main work in a separate kernel
> > > thread (i.e., khubd).  So the hub_resume routine has to call
> > > pm_stay_awake, and then khubd calls pm_relax when it is finished
> > > processing the root hub's event.  Thus we end up with:
> > >
> > >        usb_runtime_resume:     call pm_stay_awake (1)
> > >                                call hub_resume
> > >                hub_resume:             call pm_stay_awake (2)
> > >        usb_runtime_resume:     call pm_relax (balances 1)
> > >        khubd:                  call pm_relax (balances 2)
> > >
> > > As you can see, this won't work right if the pm wakeup calls aren't
> > > cumulative.  ("Nested" isn't quite the right word.)  The system would
> > > be allowed to sleep before khubd could process the root hub's wakeup
> > > event.
> > >
> > 
> > Too me this is an example of where two handles are needed, not where
> > nesting is needed. Yes, you can use a single nested handle here, but
> > the stats will not tell you which driver blocked suspend if there is a
> > problem.
> 
> It sounds like you're suggesting that handles should be allocated 
> per-driver rather than per-device.  That would tell you which driver 
> blocked suspend, but not which device.
> 
> This raises a few other questions.
> 
> With Android-based handsets, you're in a setting where you have a
> pretty good idea of what all the wakeup sources are and how they might
> get used.  On a general system this isn't so.  Many devices are
> potential wakeup sources, even though only a few of them end up getting
> enabled.  In the example above, I said "Suppose a USB root hub is
> enabled for wakeup".  But usually people won't want to do this; they
> don't want their suspended laptops to wake up when a USB mouse is
> unplugged or plugged in.
> 
> Allocating these wakeup structures for all possible wakeup sources
> would be a big waste.

Indeed.

> On the other hand, a driver generally doesn't
> know until a suspend starts whether or not its devices should be
> enabled for wakeup.  That makes things difficult, especially if you
> contemplate having multiple structures for each device.  Have you
> really sat down and thought about what wakelocks would be needed on a
> general system such as a desktop or laptop machine?
> 
> Also, I frankly have to wonder why you demand such a large amount of
> detailed debugging about the kernel's wakeup/wakelock subsystem (as
> opposed to userspace wakelocks).

That also is my question.

> Yes, it's clear that if something
> goes wrong then you will drain your phone's battery in a few hours
> rather than a few days.  But what if something goes wrong with the VM
> subsystem, or the RF subsystem, or the keypad driver?  The device would
> crash immediately or at best become almost totally unusable.  These are
> much worse scenarios than losing some battery life -- have you added
> comparable quantities of debugging to those other parts of the kernel?
> 
> In addition, you retain all these debugging facilities in your
> production builds, not just in the debug kernels.  Doesn't this seem
> excessive?

It definitely seems so.

Thanks,
Rafael
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm