RE: [PATCH v17 3/4] pwm: Add support for RZ/G2L GPT

Biju Das <biju.das.jz@xxxxxxxxxxxxxx> · Fri, 8 Dec 2023 10:34:55 +0000

Hi Uwe Kleine-König,

Thanks for the feedback.

> -----Original Message-----
> From: Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx>
> Sent: Thursday, December 7, 2023 9:42 PM
> Subject: Re: [PATCH v17 3/4] pwm: Add support for RZ/G2L GPT
> 
> Hello Biju,
> 
> On Thu, Dec 07, 2023 at 06:26:44PM +0000, Biju Das wrote:
> > > -----Original Message-----
> > > From: Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx>
> > > Sent: Wednesday, December 6, 2023 6:38 PM
> > > Subject: Re: [PATCH v17 3/4] pwm: Add support for RZ/G2L GPT On Mon,
> > > Nov 20, 2023 at 11:33:06AM +0000, Biju Das wrote:
> > > > +static u64 calculate_period_or_duty(struct rzg2l_gpt_chip
> > > > +*rzg2l_gpt,
> > > > +u32 val, u8 prescale) {
> > > > +	u64 retval;
> > > > +	u64 tmp;
> > > > +
> > > > +	tmp = NSEC_PER_SEC * (u64)val;
> > > > +	/*
> > > > +	 * To avoid losing precision for smaller period/duty cycle
> values
> > > > +	 * ((2^32 * 10^9 << 2) < 2^64), do not process the rounded
> values.
> > > > +	 */
> > > > +	if (prescale < 2)
> > > > +		retval = DIV64_U64_ROUND_UP(tmp << (2 * prescale),
> rzg2l_gpt-
> > > >rate);
> > > > +	else
> > > > +		retval = DIV64_U64_ROUND_UP(tmp, rzg2l_gpt->rate) << (2
> *
> > > > +prescale);
> > >
> > > Maybe introduce a mul_u64_u64_div64_roundup (similar to
> > > mul_u64_u64_div64) to also be exact for prescale >= 2?
> >
> > mul_u64_u64_div_u64()has a bug already and we are losing precision with
> very high values.
> > See the output with CONFIG_PWM_DEBUG enabled[1]. So we cannot reuse
> > mul_u64_u64_div64() for roundup operation.
> >
> > Maybe we need to come with 128bit operations for
> mul_u64_u64_div64_roundup().
> > Do you have any specific algorithm in mind for doing
> mul_u64_u64_div64_roundup()?
> >
> > Fabrizio already developed an algorithm for 128 bit operation, I could
> > reuse once it hits the mainline.
> >
> > [1]
> > ######[  304.213944] pwm-rzg2l-gpt 10048000.pwm: .apply is not
> idempotent (ena=1 pol=0 5500000000000/43980352512000) -> (ena=1 pol=0
> 5500000000000/43980239923200)
> > 	 High setting##
> > 	[  304.230854] pwm-rzg2l-gpt 10048000.pwm: .apply is not idempotent
> > (ena=1 pol=0 23980465100800/43980352512000) -> (ena=1 pol=0
> > 23980465100800/43980239923200)
> 
> Have you tried to understand that? What is the clk rate when this happens?
> You're not suggesting that mul_u64_u64_div_u64 is wrong, are you?

mul_u64_u64_div_u64() works for certain values. But for very high values we are losing precision
and is giving unexpected values.

See
[1] test program
[2] with mul_u64_u32_div()
[3] with mul_u64_u64_div_u64

[1]
test_prescale_1024() {
	echo "prescale 1024"
	echo "###### Low setting##"
	echo ${IO_1} > /sys/class/pwm/$PWMCHIP/export
	echo 10995116288000 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/period
	echo  6995116277760 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/duty_cycle
	echo 1 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/enable
	dumpgptreg

	echo "###### Medium setting 11000 sec = 3hours ##"
	echo 11000000000000 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/period
	echo  5500000000000 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/duty_cycle
	dumpgptreg

	echo "###### High setting##"
	echo 43980465100800 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/period
	echo 23980465100800 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/duty_cycle
	dumpgptreg
	echo 0 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/duty_cycle
	echo 10 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/period
	echo 0 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/enable
	echo ${IO_1} > /sys/class/pwm/$PWMCHIP/unexport
}

[2]
root@smarc-rzv2l:~# /pwm.sh
prescale 1024
###### Low setting##
Read at address  0x10048464 (0xffffa114c464): 0x40000001
Read at address  0x1004844C (0xffff8f06d44c): 0x28B78918
Read at address  0x1004842C (0xffffbf46e42c): 0x05000001
###### Medium setting 11000 sec = 3hours ##
Read at address  0x10048464 (0xffffb3491464): 0x400746FE
Read at address  0x1004844C (0xffffb0bbd44c): 0x2003A37F
Read at address  0x1004842C (0xffff86b2d42c): 0x05000001
###### High setting##
Read at address  0x10048464 (0xffffad78f464): 0xFFFFFFFF
Read at address  0x1004844C (0xffff8499344c): 0x8B95AD77
Read at address  0x1004842C (0xffffba33e42c): 0x05000001
[  502.409228] test end
root@smarc-rzv2l:~#

[3]
root@smarc-rzv2l:~# /pwm.sh
prescale 1024
###### Low setting##
Read at address  0x10048464 (0xffffb283e464): 0x40000001
Read at address  0x1004844C (0xffff8156044c): 0x28B78918
Read at address  0x1004842C (0xffffaf54242c): 0x05000001
###### Medium setting 11000 sec = 3hours ##
Read at address  0x10048464 (0xffff86d71464): 0x400746FE
Read at address  0x1004844C (0xffff8179244c): 0x2003A37F
Read at address  0x1004842C (0xffff8a58442c): 0x05000001
[ 1167.749392] pwm-rzg2l-gpt 10048000.pwm: .apply is not idempotent (ena=1 pol=0 5500000000000/43980352512000) -> (ena=1 pol=0 5500000000000/43980239923200)
###### High setting##
[ 1167.765592] pwm-rzg2l-gpt 10048000.pwm: .apply is not idempotent (ena=1 pol=0 23980465100800/43980352512000) -> (ena=1 pol=0 23980465100800/43980239923200)
Read at address  0x10048464 (0xffff987cc464): 0xFFFFAA19
Read at address  0x1004844C (0xffff8fedd44c): 0x8B95AD77
Read at address  0x1004842C (0xffffb124642c): 0x05000001
[ 1169.949153] test end
root@smarc-rzv2l:~#

> 
> > > With prescale <= 5 and rzg2l_gpt->rate >= 1024 this shouldn't work
> > > just fine.
> 
> I meant here "this should work just fine", I guess you understood that
> right.

Got it.

> 
> > Practically this is not possible. Yes, from theoretical point,
> > rate=1kHz compared to the practical case, 100MHz won't work.
> >
> > For the practical case, the current logic is fine. If we need to
> > achieve theoretical example like you mentioned then we need to have
> > mul_u64_u64_div64_roundup().
> 
> That shouldn't be so hard to implement.

You need to implement 128-bit operation to support mul_u64_u64_div64_roundup().
Otherwise, we will lose precision and get unexpected results.

For that you need to produce an algorithm and then implement the algorithm
and testing various 32/64/128 build systems?

> 
> > > > +	rzg2l_gpt->max_val = mul_u64_u64_div_u64(U32_MAX,
> NSEC_PER_SEC,
> > > > +						 rzg2l_gpt->rate) *
> RZG2L_MAX_SCALE_FACTOR;
> > >
> > > Is it clear that this won't overflow?
> 
> U32_MAX * NSEC_PER_SEC doesn't even overflow an u64, so using a plain
> u64 division would be more efficient.

OK.

> 
> You'd get a smaller rounding error with:

> 
> 	rzg2l_gpt->max_val = mul_u64_u64_div_u64((u64)U32_MAX *
> NSEC_PER_SEC, RZG2L_MAX_SCALE_FACTOR, rzg2l_gpt->rate);
> 

I agree. With above logic, it will overflow and due to precision
adjustment there will be rounding error. 

2^32 * 10^9 * 1024 = 4,398,046,511,104,000,000,000 > 2^64.

Cheers,
Biju