Hi Uwe Kleine-König, Thanks for the feedback. > -----Original Message----- > From: Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx> > Sent: Thursday, December 7, 2023 9:42 PM > Subject: Re: [PATCH v17 3/4] pwm: Add support for RZ/G2L GPT > > Hello Biju, > > On Thu, Dec 07, 2023 at 06:26:44PM +0000, Biju Das wrote: > > > -----Original Message----- > > > From: Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx> > > > Sent: Wednesday, December 6, 2023 6:38 PM > > > Subject: Re: [PATCH v17 3/4] pwm: Add support for RZ/G2L GPT On Mon, > > > Nov 20, 2023 at 11:33:06AM +0000, Biju Das wrote: > > > > +static u64 calculate_period_or_duty(struct rzg2l_gpt_chip > > > > +*rzg2l_gpt, > > > > +u32 val, u8 prescale) { > > > > + u64 retval; > > > > + u64 tmp; > > > > + > > > > + tmp = NSEC_PER_SEC * (u64)val; > > > > + /* > > > > + * To avoid losing precision for smaller period/duty cycle > values > > > > + * ((2^32 * 10^9 << 2) < 2^64), do not process the rounded > values. > > > > + */ > > > > + if (prescale < 2) > > > > + retval = DIV64_U64_ROUND_UP(tmp << (2 * prescale), > rzg2l_gpt- > > > >rate); > > > > + else > > > > + retval = DIV64_U64_ROUND_UP(tmp, rzg2l_gpt->rate) << (2 > * > > > > +prescale); > > > > > > Maybe introduce a mul_u64_u64_div64_roundup (similar to > > > mul_u64_u64_div64) to also be exact for prescale >= 2? > > > > mul_u64_u64_div_u64()has a bug already and we are losing precision with > very high values. > > See the output with CONFIG_PWM_DEBUG enabled[1]. So we cannot reuse > > mul_u64_u64_div64() for roundup operation. > > > > Maybe we need to come with 128bit operations for > mul_u64_u64_div64_roundup(). > > Do you have any specific algorithm in mind for doing > mul_u64_u64_div64_roundup()? > > > > Fabrizio already developed an algorithm for 128 bit operation, I could > > reuse once it hits the mainline. > > > > [1] > > ######[ 304.213944] pwm-rzg2l-gpt 10048000.pwm: .apply is not > idempotent (ena=1 pol=0 5500000000000/43980352512000) -> (ena=1 pol=0 > 5500000000000/43980239923200) > > High setting## > > [ 304.230854] pwm-rzg2l-gpt 10048000.pwm: .apply is not idempotent > > (ena=1 pol=0 23980465100800/43980352512000) -> (ena=1 pol=0 > > 23980465100800/43980239923200) > > Have you tried to understand that? What is the clk rate when this happens? > You're not suggesting that mul_u64_u64_div_u64 is wrong, are you? mul_u64_u64_div_u64() works for certain values. But for very high values we are losing precision and is giving unexpected values. See [1] test program [2] with mul_u64_u32_div() [3] with mul_u64_u64_div_u64 [1] test_prescale_1024() { echo "prescale 1024" echo "###### Low setting##" echo ${IO_1} > /sys/class/pwm/$PWMCHIP/export echo 10995116288000 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/period echo 6995116277760 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/duty_cycle echo 1 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/enable dumpgptreg echo "###### Medium setting 11000 sec = 3hours ##" echo 11000000000000 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/period echo 5500000000000 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/duty_cycle dumpgptreg echo "###### High setting##" echo 43980465100800 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/period echo 23980465100800 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/duty_cycle dumpgptreg echo 0 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/duty_cycle echo 10 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/period echo 0 > /sys/class/pwm/$PWMCHIP/pwm${IO_1}/enable echo ${IO_1} > /sys/class/pwm/$PWMCHIP/unexport } [2] root@smarc-rzv2l:~# /pwm.sh prescale 1024 ###### Low setting## Read at address 0x10048464 (0xffffa114c464): 0x40000001 Read at address 0x1004844C (0xffff8f06d44c): 0x28B78918 Read at address 0x1004842C (0xffffbf46e42c): 0x05000001 ###### Medium setting 11000 sec = 3hours ## Read at address 0x10048464 (0xffffb3491464): 0x400746FE Read at address 0x1004844C (0xffffb0bbd44c): 0x2003A37F Read at address 0x1004842C (0xffff86b2d42c): 0x05000001 ###### High setting## Read at address 0x10048464 (0xffffad78f464): 0xFFFFFFFF Read at address 0x1004844C (0xffff8499344c): 0x8B95AD77 Read at address 0x1004842C (0xffffba33e42c): 0x05000001 [ 502.409228] test end root@smarc-rzv2l:~# [3] root@smarc-rzv2l:~# /pwm.sh prescale 1024 ###### Low setting## Read at address 0x10048464 (0xffffb283e464): 0x40000001 Read at address 0x1004844C (0xffff8156044c): 0x28B78918 Read at address 0x1004842C (0xffffaf54242c): 0x05000001 ###### Medium setting 11000 sec = 3hours ## Read at address 0x10048464 (0xffff86d71464): 0x400746FE Read at address 0x1004844C (0xffff8179244c): 0x2003A37F Read at address 0x1004842C (0xffff8a58442c): 0x05000001 [ 1167.749392] pwm-rzg2l-gpt 10048000.pwm: .apply is not idempotent (ena=1 pol=0 5500000000000/43980352512000) -> (ena=1 pol=0 5500000000000/43980239923200) ###### High setting## [ 1167.765592] pwm-rzg2l-gpt 10048000.pwm: .apply is not idempotent (ena=1 pol=0 23980465100800/43980352512000) -> (ena=1 pol=0 23980465100800/43980239923200) Read at address 0x10048464 (0xffff987cc464): 0xFFFFAA19 Read at address 0x1004844C (0xffff8fedd44c): 0x8B95AD77 Read at address 0x1004842C (0xffffb124642c): 0x05000001 [ 1169.949153] test end root@smarc-rzv2l:~# > > > > With prescale <= 5 and rzg2l_gpt->rate >= 1024 this shouldn't work > > > just fine. > > I meant here "this should work just fine", I guess you understood that > right. Got it. > > > Practically this is not possible. Yes, from theoretical point, > > rate=1kHz compared to the practical case, 100MHz won't work. > > > > For the practical case, the current logic is fine. If we need to > > achieve theoretical example like you mentioned then we need to have > > mul_u64_u64_div64_roundup(). > > That shouldn't be so hard to implement. You need to implement 128-bit operation to support mul_u64_u64_div64_roundup(). Otherwise, we will lose precision and get unexpected results. For that you need to produce an algorithm and then implement the algorithm and testing various 32/64/128 build systems? > > > > > + rzg2l_gpt->max_val = mul_u64_u64_div_u64(U32_MAX, > NSEC_PER_SEC, > > > > + rzg2l_gpt->rate) * > RZG2L_MAX_SCALE_FACTOR; > > > > > > Is it clear that this won't overflow? > > U32_MAX * NSEC_PER_SEC doesn't even overflow an u64, so using a plain > u64 division would be more efficient. OK. > > You'd get a smaller rounding error with: > > rzg2l_gpt->max_val = mul_u64_u64_div_u64((u64)U32_MAX * > NSEC_PER_SEC, RZG2L_MAX_SCALE_FACTOR, rzg2l_gpt->rate); > I agree. With above logic, it will overflow and due to precision adjustment there will be rounding error. 2^32 * 10^9 * 1024 = 4,398,046,511,104,000,000,000 > 2^64. Cheers, Biju