Re: [PATCH 0/8] thermal:cpu cooling:fix: Provide thermal core fixes with deferred probe for several drivers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lukasz,

Thanks for the keeping this up. And apologize for late answer.

On Thu, Nov 13, 2014 at 06:02:37PM +0100, Lukasz Majewski wrote:
> Presented fixes are a response for problem described below:
> http://thread.gmane.org/gmane.linux.kernel/1793821/match=thermal+core+fix+initialize+max_state+variable+0
> 
> In short - it turned out that two trivial fixes (included in this patch set)
> require support for deferred probe in thermal drivers.
> 
> This situation shows up when CPU frequency reduction is used as a thermal cooling
> device for a thermal zone.
> It happens that during initialization, the call to thermal probe will be executed
> before cpufreq probe (it can be observed at ./drivers/Makefile).
> In such a situation thermal will not be properly configured until cpufreq policy
> is setup.
> 
> In the current code (without included fixes) there is a time window in which thermal
> can try to use not configured cpufreq and possibly crash the system.
> 
> 
> Proposed solution was based on the code already available in the imx_thermal.c file.
> 
> /db8500_thermal.c:                      -> NOT NEEDED
> /intel_powerclamp.c:                    -> NOT NEEDED - INTEL (x86)
> /intel_powerclamp.c:                    -> NOT NEEDED - INTEL (x86)
> /ti-soc-thermal/ti-bandgap.c:           -> FIXED  [omap2plus_defconfig]
> /dove_thermal.c:                        -> NOT NEEDED - CPU_COOLING NOT AVAILABLE
>                                                         [dove_defconfig]
> /spear_thermal.c:                       -> FIXED  [spear3xx_defconfig]
> /samsung/exynos_tmu.c:                  -> NOT NEEDED (nasty hack - will be reworked in later patches)
> /imx_thermal.c:                         -> OK (deferred probe already in place)
> /int340x_thermal/int3402_thermal.c:     -> NOT NEEDED - ACPI x86 - Intel specific
> /int340x_thermal/int3400_thermal.c:     -> NOT NEEDED - ACPI x86 - Intel specific
> /tegra_soctherm.c:                      -> FIXED  [tegra_defconfig]
> /kirkwood_thermal.c:                    -> FIXED  [multi_v5_defconfig]
> /armada_thermal.c:                      -> FIXED  [multi_v7_defconfig]
> /rcar_thermal.c:                        -> FIXED  [shmobile_defconfig]
> /db8500_cpufreq_cooling.c:              -> OK (deferred probe already in place) [multi_v7_defconfig]
> /st/st_thermal_syscfg.c:                -> NOT NEEDED (Those two are enabled by e.g. ARMADA)
> /st/st_thermal_memmap.c:
> 
> 


Instead of doing the same check on all drivers in the need for cpu
cooling looks like a promiscuous solution. What if we do this check in
cpu cooling itself and we propagate the error in callers code? 

From what I see, only exynos does not propagate the error. And we would
need a tweak in the cpufreq-dt code. Something like the following
(not tested):

diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index f657c57..f139247 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -181,7 +181,6 @@ static int cpufreq_init(struct cpufreq_policy *policy)
 {
 	struct cpufreq_dt_platform_data *pd;
 	struct cpufreq_frequency_table *freq_table;
-	struct thermal_cooling_device *cdev;
 	struct device_node *np;
 	struct private_data *priv;
 	struct device *cpu_dev;
@@ -264,20 +263,6 @@ static int cpufreq_init(struct cpufreq_policy *policy)
 		goto out_free_priv;
 	}
 
-	/*
-	 * For now, just loading the cooling device;
-	 * thermal DT code takes care of matching them.
-	 */
-	if (of_find_property(np, "#cooling-cells", NULL)) {
-		cdev = of_cpufreq_cooling_register(np, cpu_present_mask);
-		if (IS_ERR(cdev))
-			dev_err(cpu_dev,
-				"running cpufreq without cooling device: %ld\n",
-				PTR_ERR(cdev));
-		else
-			priv->cdev = cdev;
-	}
-
 	priv->cpu_dev = cpu_dev;
 	priv->cpu_reg = cpu_reg;
 	policy->driver_data = priv;
@@ -287,7 +272,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
 	if (ret) {
 		dev_err(cpu_dev, "%s: invalid frequency table: %d\n", __func__,
 			ret);
-		goto out_cooling_unregister;
+		goto free_table;
 	}
 
 	policy->cpuinfo.transition_latency = transition_latency;
@@ -300,8 +285,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
 
 	return 0;
 
-out_cooling_unregister:
-	cpufreq_cooling_unregister(priv->cdev);
+free_table:
 	dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table);
 out_free_priv:
 	kfree(priv);
@@ -342,11 +326,14 @@ static struct cpufreq_driver dt_cpufreq_driver = {
 
 static int dt_cpufreq_probe(struct platform_device *pdev)
 {
+	struct device_node *np;
 	struct device *cpu_dev;
 	struct regulator *cpu_reg;
 	struct clk *cpu_clk;
 	int ret;
 
+	/* at this point we checked the pointer already right? */
+	np = of_node_get(pdev->dev.of_node);
 	/*
 	 * All per-cluster (CPUs sharing clock/voltages) initialization is done
 	 * from ->init(). In probe(), we just need to make sure that clk and
@@ -368,6 +355,28 @@ static int dt_cpufreq_probe(struct platform_device *pdev)
 	if (ret)
 		dev_err(cpu_dev, "failed register driver: %d\n", ret);
 
+	/*
+	 * For now, just loading the cooling device;
+	 * thermal DT code takes care of matching them.
+	 */
+	if (of_find_property(np, "#cooling-cells", NULL)) {
+		struct cpufreq_policy policy;
+		struct private_data *priv;
+		struct thermal_cooling_device *cdev;
+
+		/* TODO: can cpu0 be always used ? */
+		cpufreq_get_policy(&policy, 0);
+		priv = policy.driver_data;
+		cdev = of_cpufreq_cooling_register(np, cpu_present_mask);
+		if (IS_ERR(cdev))
+			dev_err(cpu_dev,
+				"running cpufreq without cooling device: %ld\n",
+				PTR_ERR(cdev));
+		else
+			priv->cdev = cdev;
+	}
+	of_node_put(np);
+
 	return ret;
 }
 
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 1ab0018..342eb9e 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -440,6 +440,11 @@ __cpufreq_cooling_register(struct device_node *np,
 	int ret = 0, i;
 	struct cpufreq_policy policy;
 
+	if (!cpufreq_get_current_driver()) {
+		dev_warn(&pdev->dev, "no cpufreq driver, deferring.");
+		return -EPROBE_DEFER;
+	}
+
 	/* Verify that all the clip cpus have same freq_min, freq_max limit */
 	for_each_cpu(i, clip_cpus) {
 		/* continue if cpufreq policy not found and not return error */
diff --git a/drivers/thermal/samsung/exynos_thermal_common.c b/drivers/thermal/samsung/exynos_thermal_common.c
index 3f5ad25..f84975e 100644
--- a/drivers/thermal/samsung/exynos_thermal_common.c
+++ b/drivers/thermal/samsung/exynos_thermal_common.c
@@ -373,7 +373,7 @@ int exynos_register_thermal(struct thermal_sensor_conf *sensor_conf)
 		if (IS_ERR(th_zone->cool_dev[th_zone->cool_dev_size])) {
 			dev_err(sensor_conf->dev,
 				"Failed to register cpufreq cooling device\n");
-			ret = -EINVAL;
+			ret = PTR_ERR(th_zone->cool_dev[th_zone->cool_dev_size]);
 			goto err_unregister;
 		}
 		th_zone->cool_dev_size++;


The above way, we avoid having same test in every driver that needs it.
Besides, it makes sense the cpu_cooling code takes care of this check,
as it is the very first part that has direct dependency with cpufreq.

> I only possess Exynos boards and Beagle Bone Black, so I'd be grateful for
> testing proposed solution on other boards. The posted code is compile tested.
> 
> This code applies on Eduardo's ti-soc-thermal-next tree:
> SHA1: 208a97042d66d9bfbcfab0d4a00c9fe317bb73d3
> 
> Lukasz Majewski (8):
>   thermal:cpu cooling:armada: Provide deferred probing for armada driver
>   thermal:cpu cooling:kirkwood: Provide deferred probing for kirkwood
>     driver
>   thermal:cpu cooling:rcar: Provide deferred probing for rcar driver
>   thermal:cpu cooling:spear: Provide deferred probing for spear driver
>   thermal:cpu cooling:tegra: Provide deferred probing for tegra driver
>   thermal:cpu cooling:ti: Provide deferred probing for ti drivers
>   thermal:core:fix: Initialize the max_state variable to 0
>   thermal:core:fix: Check return code of the ->get_max_state() callback
> 
>  drivers/thermal/armada_thermal.c            | 7 +++++++
>  drivers/thermal/kirkwood_thermal.c          | 7 +++++++
>  drivers/thermal/rcar_thermal.c              | 7 +++++++
>  drivers/thermal/spear_thermal.c             | 7 +++++++
>  drivers/thermal/tegra_soctherm.c            | 7 +++++++
>  drivers/thermal/thermal_core.c              | 8 +++++---
>  drivers/thermal/ti-soc-thermal/ti-bandgap.c | 7 +++++++
>  7 files changed, 47 insertions(+), 3 deletions(-)
> 
> -- 
> 2.0.0.rc2
> 

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux