Re: [PATCH v2] pwm: tegra: Optimize period calculation

Jon Hunter <jonathanh@xxxxxxxxxx> · Tue, 25 Oct 2022 15:22:18 +0100

Uwe, Thierry,

On 25/04/2022 14:22, Uwe Kleine-König wrote:
Dividing by the result of a division looses precision because the result is
rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
following numbers result:

	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144

The exact result would be 6142.5061875 and (apart from rounding) this is
found by using a single division. As a side effect is also a tad
cheaper to calculate.

Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
example clk_rate = 47999999 and period = 106667:

	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
			    NSEC_PER_SEC) = 19

	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20

(The exact result is 20.000062083332033.)

With this optimizations also switch from round-closest to round-down for
the period calculation. Given that the calculations were non-optimal for
quite some time now with variations in both directions which nobody
reported as a problem, this is the opportunity to align the driver's
behavior to the requirements of new drivers. This has several upsides:

  - Implementation is easier as there are no round-nearest variants of
    mul_u64_u64_div_u64().
  - Requests for too small periods are now consistently refused. This was
    kind of arbitrary before, where period_ns < min_period_ns was
    refused, but in some cases min_period_ns isn't actually implementable
    and then values between min_period_ns and the actual minimum were
    rounded up to the actual minimum.

Note that the duty_cycle calculation isn't using the usual round-down
approach yet.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx>
---
Hello,

changes since (implicit) v1: Updated changelog to explain why rate = 0
is refused now.

Best regards
Uwe

  drivers/pwm/pwm-tegra.c | 10 +++++-----
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
index e5a9ffef4a71..7fc03a9ec154 100644
--- a/drivers/pwm/pwm-tegra.c
+++ b/drivers/pwm/pwm-tegra.c
@@ -99,7 +99,7 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
  			    int duty_ns, int period_ns)
  {
  	struct tegra_pwm_chip *pc = to_tegra_pwm_chip(chip);
-	unsigned long long c = duty_ns, hz;
+	unsigned long long c = duty_ns;
  	unsigned long rate, required_clk_rate;
  	u32 val = 0;
  	int err;
@@ -156,11 +156,9 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
  		pc->clk_rate = clk_get_rate(pc->clk);
  	}
  
-	rate = pc->clk_rate >> PWM_DUTY_WIDTH;
-
  	/* Consider precision in PWM_SCALE_WIDTH rate calculation */
-	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns);
-	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz);
+	rate = mul_u64_u64_div_u64(pc->clk_rate, period_ns,
+				   (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH);
  
  	/*
  	 * Since the actual PWM divider is the register's frequency divider
@@ -169,6 +167,8 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
  	 */
  	if (rate > 0)
  		rate--;
+	else
+		return -EINVAL;


I am seeing more problems with this ...

1. In the case where we call dev_pm_opp_set_rate() to set the PWM clock
   rate, the requested rate is calculated as ...

  required_clk_rate = (NSEC_PER_SEC / period_ns) << PWM_DUTY_WIDTH;

   For Tegra234, I have a case where the period is 45334 and so the
   above yields a rate of 5646848Hz. In this case, mul_u64_u64_div_u64()
   returns 0 because ...

   (5646848 * 45334)/(NSEC_PER_SEC << PWM_DUTY_WIDTH) = 0.9999

   We can fix this by ...

  required_clk_rate = (NSEC_PER_SEC << PWM_DUTY_WIDTH) / period_ns;

   The above produces a rate of 5646975 vs 5646848Hz.

2. Even after fixing issue #1, above, I then ran into another issue
   where even if I request a clock rate of 5646975 I still get a lower
   clock. This also causes  mul_u64_u64_div_u64() to return 0 and then I
   see the -EINVAL returned. The simple solution here is to not return
   -EINVAL for 0. After all 0, could be valid if the rate if we are
   not dividing down the parent clock in the PWM. However, then there is
   always the case we are not operating at the expected rate. Seems like
   we should be rounding the result.

Thoughts?

Jon

--
nvpublic