Re: [RFC][PATCH -mm 3/3] Freezer: Replace the timeout

"Rafael J. Wysocki" <rjw@xxxxxxx> · Wed, 1 Aug 2007 12:43:24 +0200

On Wednesday, 1 August 2007 10:31, Pavel Machek wrote:
> Hi!
> 
> > Instead of using the global timeout, we can use a more fine grained method of
> > checking if the freezing of tasks should fail.  Namely, we can measure the time
> > in which no tasks have entered the refrigerator by counting the number of calls
> > to wait_event_timeout() in try_to_freeze_tasks() that have returned 0 (in a
> > row).
> > 
> > After sending freeze requests to the tasks regarded as freezable
> > try_to_freeze_tasks() goes to sleep and waits until at least one task enters the
> > refrigerator.  If the refrigerator is not entered by any tasks before WAIT_TIME
> > expires, try_to_freeze_tasks() increases the counter of expired timeouts and
> > sends freeze requests to the remaining tasks.  If the number of expired timeouts
> > becomes greater than MAX_WAITS, the freezing of tasks fails (the counter of
> > expired timeouts is reset whenever a task enters the refrigerator).
> 
> I do not get logic behind this.
> 
> Old logic was "we give system 20 seconds to come into quiet state".
> 
> New logic is "if we do no progress within second, we fail"... which is
> quite a big change.

Well, I agree, and that's why I wanted to separate this part from the two
previous patches ...

> What happens on loaded ext3 filesystem, for example? Bunch of userland tasks
> will wait on data to be synced to disk, taking more than second, no?

IMHO this only is a question of what the value of MAX_WAITS should be.
[I took 5 because it turned to be enough in my testing, but that could be 10 or
more.]

The point is that in 99.(9)% of cases the 20s timeout is unnecessary, because:
(1) most often we succeed within 1s
(2) if we are going to fail, we can say that we'll fail way before the 20s
    expires.
Now, the question is how we can check that we'll fail and this patch attempts
to use a simple machanism:
* measure the time in which no tasks have entered the refrigerator and if this
  time is long enough, we can safely assume the "blocking" tasks to be stuck
  somewhere and give up.
This isn't bullet proof, but it should cover the vast majority of cases.

Anyway, eventually, I'd like the freezer to detect failures relatively early,
so the user won't have to wait 20s each time it's going to fail.

Greetings,
Rafael

-- 
"Premature optimization is the root of all evil." - Donald Knuth

_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm