Re: [PATCH v2] pwm: tegra: Optimize period calculation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Jon,

On Tue, Oct 25, 2022 at 03:22:18PM +0100, Jon Hunter wrote:
> On 25/04/2022 14:22, Uwe Kleine-König wrote:
> > Dividing by the result of a division looses precision because the result is
> > rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
> > following numbers result:
> > 
> > 	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
> > 	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
> > 	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144
> > 
> > The exact result would be 6142.5061875 and (apart from rounding) this is
> > found by using a single division. As a side effect is also a tad
> > cheaper to calculate.
> > 
> > Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
> > example clk_rate = 47999999 and period = 106667:
> > 
> > 	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
> > 			    NSEC_PER_SEC) = 19
> > 
> > 	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> > 			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20
> > 
> > (The exact result is 20.000062083332033.)
> > 
> > With this optimizations also switch from round-closest to round-down for
> > the period calculation. Given that the calculations were non-optimal for
> > quite some time now with variations in both directions which nobody
> > reported as a problem, this is the opportunity to align the driver's
> > behavior to the requirements of new drivers. This has several upsides:
> > 
> >   - Implementation is easier as there are no round-nearest variants of
> >     mul_u64_u64_div_u64().
> >   - Requests for too small periods are now consistently refused. This was
> >     kind of arbitrary before, where period_ns < min_period_ns was
> >     refused, but in some cases min_period_ns isn't actually implementable
> >     and then values between min_period_ns and the actual minimum were
> >     rounded up to the actual minimum.
> > 
> > Note that the duty_cycle calculation isn't using the usual round-down
> > approach yet.
> > 
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx>
> > ---
> > Hello,
> > 
> > changes since (implicit) v1: Updated changelog to explain why rate = 0
> > is refused now.
> > 
> > Best regards
> > Uwe
> > 
> >   drivers/pwm/pwm-tegra.c | 10 +++++-----
> >   1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
> > index e5a9ffef4a71..7fc03a9ec154 100644
> > --- a/drivers/pwm/pwm-tegra.c
> > +++ b/drivers/pwm/pwm-tegra.c
> > @@ -99,7 +99,7 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> >   			    int duty_ns, int period_ns)
> >   {
> >   	struct tegra_pwm_chip *pc = to_tegra_pwm_chip(chip);
> > -	unsigned long long c = duty_ns, hz;
> > +	unsigned long long c = duty_ns;
> >   	unsigned long rate, required_clk_rate;
> >   	u32 val = 0;
> >   	int err;
> > @@ -156,11 +156,9 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> >   		pc->clk_rate = clk_get_rate(pc->clk);
> >   	}
> > -	rate = pc->clk_rate >> PWM_DUTY_WIDTH;
> > -
> >   	/* Consider precision in PWM_SCALE_WIDTH rate calculation */
> > -	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns);
> > -	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz);
> > +	rate = mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> > +				   (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH);
> >   	/*
> >   	 * Since the actual PWM divider is the register's frequency divider
> > @@ -169,6 +167,8 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> >   	 */
> >   	if (rate > 0)
> >   		rate--;
> > +	else
> > +		return -EINVAL;
> 
> 
> I am seeing more problems with this ...
> 
> 1. In the case where we call dev_pm_opp_set_rate() to set the PWM clock
>    rate, the requested rate is calculated as ...
> 
>   required_clk_rate = (NSEC_PER_SEC / period_ns) << PWM_DUTY_WIDTH;
> 
>    For Tegra234, I have a case where the period is 45334 and so the
>    above yields a rate of 5646848Hz. In this case, mul_u64_u64_div_u64()
> 
>    returns 0 because ...
> 
>    (5646848 * 45334)/(NSEC_PER_SEC << PWM_DUTY_WIDTH) = 0.9999
> 
>    We can fix this by ...
> 
>   required_clk_rate = (NSEC_PER_SEC << PWM_DUTY_WIDTH) / period_ns;
> 
>    The above produces a rate of 5646975 vs 5646848Hz.

You probably also want to round up that division such that you work with
5646976.

> 2. Even after fixing issue #1, above, I then ran into another issue
>    where even if I request a clock rate of 5646975 I still get a lower
>    clock. This also causes  mul_u64_u64_div_u64() to return 0 and then I
>    see the -EINVAL returned. The simple solution here is to not return
>    -EINVAL for 0. After all 0, could be valid if the rate if we are
>    not dividing down the parent clock in the PWM. However, then there is

As you have to subtract 1 from the result of the division, you need to
write a -1 in the register which doesn't work. Writing a 0 results in a
bigger period than requested which is the thing I intended to fix with
the blamed commit.

If clk_rate was 5646976 (don't know if that matches as nicely as it
should? If dev_pm_opp_set_rate might set a lower rate you're out of luck
again) we get:

	rate = mul_u64_u64_div_u64(5646976, 45334, 1000000000 << 8)

which is 1 and everything is fine.

Note that the math before my commit already had that problem. Before the
driver was just more lax and didn't subtract 1 from rate and so
configured a bigger period than requested/expected

There are probably similar cases where the driver still configures a too
big period (i.e. when the effect you described yields a rate that is too
small but bigger than 0).

So the optimal way to fix this is to adapt the calculation of
required_clk_rate as you suggested + rounding up and to make sure that
dev_pm_opp_set_rate doesn't configure a rate lower than requested. The
latter might be complicated as the API (AFAIK) isn't specific about
rounding directions and there isn't a dev_pm_opp_round_rate function.

As a short term work-around dropping the -EINVAL is probably best, I'd
like to have it documented clearly however that continuing in that case
is wrong and yields unexpected settings.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux