[PATCH v2 5/5] cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end

"Srivatsa S. Bhat" <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> · Tue, 29 Apr 2014 00:25:12 +0530

Some cpufreq drivers were redundantly invoking the _begin() and _end()
APIs around frequency transitions, and this double invocation (one from
the cpufreq core and the other from the cpufreq driver) used to result
in a self-deadlock, leading to system hangs during boot. (The _begin()
API makes contending callers wait until the previous invocation is
complete. Hence, the cpufreq driver would end up waiting on itself!).

Now all such drivers have been fixed, but debugging this issue was not
very straight-forward (even lockdep didn't catch this). So let us add a
debug infrastructure to the cpufreq core to catch such issues more easily
in the future.

We add a new field called 'transition_task' to the policy structure, to keep
track of the task which is performing the frequency transition. Using this
field, we make note of this task during _begin() and print a warning if we
find a case where the same task is calling _begin() again, before completing
the previous frequency transition using the corresponding _end().

One important aspect to consider is that it is valid for ASYNC_NOTIFICATION
drivers to invoke _begin() from one task and then invoke the corresponding
_end() in another task. We take care of this scenario by setting the value
of 'transition_task' once again explicitly in the _end() function. This
helps us handle the particular tricky scenario (shown below) that can occur
in ASYNC_NOTIFICATION drivers:

Scenario 1: (Deadlock-free)
----------

         Task A						Task B

    /* 1st freq transition */
    Invoke _begin() {
            ...
            ...
    }

    Change the frequency

						Got interrupt for successful
						change of frequency.

						/* 1st freq transition */
						Invoke _end() {
							...
							...
    /* 2nd freq transition */    			...
    Invoke _begin() {					...
	    ...	//waiting for B				...
            ... //to finish _end()		}
	    ...
	    ...
    }


This scenario is actually deadlock-free because Task A can wait inside the
second call to _begin() without self-deadlocking, because it is the
responsibility of Task B to finish the first sequence by invoking the
corresponding _end().

By setting the value of 'transition_task' again explicitly in _end(), we
ensure that the code won't print a false-positive warning in this case.

However the same code successfully catches the following deadlock-prone
scenario even in ASYNC_NOTIFICATION drivers:

Scenario 2: (Deadlock-prone)
----------

         Task A						Task B

    /* 1st freq transition */
    Invoke _begin() {
            ...
            ...
    }

    /* 2nd freq transition */
    Invoke _begin() {
	    ...
	    ...
    }

    Change the frequency


Here the bug is that Task A called the second _begin() *before* actually
performing the 1st frequency transition. In other words, it failed to set
Task B in motion for the 1st frequency transition, and hence it will
self-deadlock. This is very similar to the case of drivers which do
synchronous notification, and hence the debug infrastructure developed
in this patch can catch this scenario easily.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>
---

 drivers/cpufreq/cpufreq.c |   12 ++++++++++++
 include/linux/cpufreq.h   |    1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index abda660..2c99a6c 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -354,6 +354,10 @@ static void cpufreq_notify_post_transition(struct cpufreq_policy *policy,
 void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
 		struct cpufreq_freqs *freqs)
 {
+
+	/* Catch double invocations of _begin() which lead to self-deadlock */
+	WARN_ON(current == policy->transition_task);
+
 wait:
 	wait_event(policy->transition_wait, !policy->transition_ongoing);
 
@@ -365,6 +369,7 @@ wait:
 	}
 
 	policy->transition_ongoing = true;
+	policy->transition_task = current;
 
 	spin_unlock(&policy->transition_lock);
 
@@ -378,9 +383,16 @@ void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
 	if (unlikely(WARN_ON(!policy->transition_ongoing)))
 		return;
 
+	/*
+	 * The task invoking _end() could be different from the one that
+	 * invoked the _begin(). So set ->transition_task again here
+	 * explicity.
+	 */
+	policy->transition_task = current;
 	cpufreq_notify_post_transition(policy, freqs, transition_failed);
 
 	policy->transition_ongoing = false;
+	policy->transition_task = NULL;
 
 	wake_up(&policy->transition_wait);
 }
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 5ae5100..8f44d79 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -110,6 +110,7 @@ struct cpufreq_policy {
 	bool			transition_ongoing; /* Tracks transition status */
 	spinlock_t		transition_lock;
 	wait_queue_head_t	transition_wait;
+	struct task_struct	*transition_task; /* Task which is doing the transition */
 };
 
 /* Only for ACPI */

--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html