Re: [PATCH] Documentation/power: Update docs about suspend and CPU hotplug

"Rafael J. Wysocki" <rjw@xxxxxxx> · Sun, 16 Oct 2011 00:42:40 +0200

On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
> On 10/13/2011 12:49 AM, Rafael J. Wysocki wrote:
> > On Wednesday, October 12, 2011, Srivatsa S. Bhat wrote:
> >> On 10/12/2011 03:32 AM, Rafael J. Wysocki wrote:
> >>> On Tuesday, October 11, 2011, Srivatsa S. Bhat wrote:
> >>>> Update the documentation about the interaction between the suspend (S3) call
> >>>> path and the CPU hotplug infrastructure.
> >>>> This patch focusses only on the activities of the freezer, cpu hotplug and
> >>>> the notifications involved. It outlines how regular CPU hotplug differs from
> >>>> the way it is invoked during suspend and also tries to explain the locking
> >>>> involved.
> >>>>
> >>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
> >>>> ---
> >>>>
> >>>>  Documentation/power/00-INDEX                   |    2 
> >>>>  Documentation/power/suspend-and-cpuhotplug.txt |  113 ++++++++++++++++++++++++
> >>>>  2 files changed, 115 insertions(+), 0 deletions(-)
> >>>>  create mode 100644 Documentation/power/suspend-and-cpuhotplug.txt
> >>>>
> >>>> diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX
> >>>> index 45e9d4a..a4d682f 100644
> >>>> --- a/Documentation/power/00-INDEX
> >>>> +++ b/Documentation/power/00-INDEX
> >>>> @@ -26,6 +26,8 @@ s2ram.txt
> >>>>  	- How to get suspend to ram working (and debug it when it isn't)
> >>>>  states.txt
> >>>>  	- System power management states
> >>>> +suspend-and-cpuhotplug.txt
> >>>> +	- Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug
> >>>>  swsusp-and-swap-files.txt
> >>>>  	- Using swap files with software suspend (to disk)
> >>>>  swsusp-dmcrypt.txt
> >>>> diff --git a/Documentation/power/suspend-and-cpuhotplug.txt b/Documentation/power/suspend-and-cpuhotplug.txt
> >>>> new file mode 100644
> >>>> index 0000000..d0ba411
> >>>> --- /dev/null
> >>>> +++ b/Documentation/power/suspend-and-cpuhotplug.txt
> >>>> @@ -0,0 +1,113 @@
> >>>> +Interaction of Suspend code (S3) with the CPU hotplug infrastructure
> >>>> +   (C) 2011 Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>, GPL
> >>>> +
> >>>> +
> >>>> +I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
> >>>> +
> >>>> +Well, a picture speaks more than a thousand words... So ASCII art follows :-)
> >>>> +
> >>>> +[This depicts the current design in the kernel, and focusses only on the
> >>>> +interactions between suspend call paths involving the freezer and cpu hotplug
> >>>> +and also tries to explain the locking involved. It also outlines the
> >>>> +notifications involved.]
> >>>> +
> >>>> +On a high level, the suspend-resume cycle goes like this:
> >>>> +
> >>>> +|Freeze| -> |Disable nonboot| -> |Do suspend| -> |Enable nonboot| -> |Thaw |
> >>>> +|tasks |    |     cpus      |    |          |    |     cpus     |    |tasks|
> >>>> +
> >>>> +
> >>>> +More details follow:
> >>>> +
> >>>> +Regular CPU hotplug                                   Suspend call path
> >>>> +-------------------                              ---------------------------
> >>>> +
> >>>> +Write 0 (or 1) to                                     Write 'mem' to
> >>>> +/sys/devices/system/cpu/cpu*/online                   /sys/power/state
> >>>> +     sysfs file                                          syfs file
> >>>> +         |                                                   |
> >>>> +         |                                                   v
> >>>> +         |                                         Acquire pm_mutex lock
> >>>> +         |                                                   |
> >>>> +         |                                                   v
> >>>> +         |                                Send PM_SUSPEND_PREPARE notifications
> >>>> +         |                                                   |
> >>>> +         |                                                   v
> >>>> +         |                                              Freeze tasks
> >>>
> >>> OK, so something appears to be missing here.  Namely, the task writing to
> >>> /sys/devices/system/cpu/cpu*/online should be frozen at this point or
> >>> suspend should be aborted.  I suppose neither of these happens and I wonder
> >>> why exactly.
> >>>
> >>
> >> I have a couple of clarifications to make here:
> >> * Firstly, this picture is not meant to represent what happens when regular
> >>   cpu hotplug and suspend run together. That race condition has not been
> >>   brought out here. What it does try to explain is, how the regular cpu
> >>   hotplug path is different from suspend, and where they share common code.
> >>   Please don't think about timing/race condition when reading it. Its just
> >>   meant to explain the call path and locking involved.
> > 
> > Well, I didn't understand this part.  And the question above is:
> > 
> >> I. How does the Suspend-to-RAM code interact with CPU hotplug infrastructure?
> > 
> > which kind of suggests something different from what you're saying.  Care to
> > clarify that in the document?
> > 
> 
> Ok, I get it. I'll clarify the question and post the next version.
> 
> >> * Secondly, this picture explains the *current* design, and *not* the mutual
> >>   exclusion design I have proposed between regular cpu hotplug and suspend.
> >>   The reason being, this doc was written to help everyone understand the
> >>   current locking schemes, to help evaluate my proposal for a different
> >>   scheme (mutual exclusion).
> > 
> > I understand that.
> > 
> >> Now, coming to your point, if that task writing to the sysfs file has not
> >> been frozen, then the current kernel doesn't abort suspend, which is why we are
> >> encountering problems, and which is exactly what my patchset tries to solve.
> >> Link to my patchset:
> >> http://thread.gmane.org/gmane.linux.documentation/3414/focus=3414
> > 
> > This isn't my point, actually.  My point is that the task writing to
> > /sys/devices/system/cpu/cpu*/online should be frozen by the freezer.
> > If it is not frozen, then the freezer should fail.  If that doesn't
> > happen, there's a bug that has to be fixed and it is _not_ the lack
> > of mutual exclusion.  The bug is that, apparently, suspend continues
> > even though there is an unfrozen user space process in the system.
> > 
> > Do you have any idea why that happens?
> > 
> 
> Sorry, I think I explained it wrong above. The freezer doesn't have any bugs
> in this context. If it fails to freeze the tasks, suspend does get aborted.
> 
> But the point here is, suppose the task writing to the 'online' sysfs
> file has already entered the kernel, and _now_ the freezer started
> freezing tasks, it might encounter trouble in freezing that cpu hotplug
> operation that has already begun (because the cpu hotplug online operation
> waits on the frozen userspace to get microcode). 
> 
> So, to clarify again, regular cpu hotplug and the cpu hotplug operations
> carried out during suspend are properly serialized in the current kernel.
> We have no problems there.
> 
> The problem is with freezer and regular cpu hotplug racing with each other,
> as illustrated in this scenario:
>   * the regular cpu online operation continues its journey from userspace
>     into the kernel, since the freezing has not yet begun.
>   * then freezer starts and freezes userspace.
>   * If cpu online has not yet completed the microcode update stuff by now,
>     it will now start waiting on the frozen userspace unfortunately.
>   * Now the freezer continues and tries to freeze the remaining tasks. But
>     due to this wait mentioned above, the freezer won't be able to freeze
>     the cpu online hotplug task and hence freezing of tasks fails.
>   [This race condition is where the whole problem lies.]
> 
> And if freezing of tasks fails, then suspend gets aborted. So no problems
> there again.

That pretty much is how it is supposed to be.

Do I understand correctly that you're attempting to make suspend always
succeed if CPU hotplug stress test is run in parallel with it?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html