Hey Uwe, took another look at this today. I'll give you some more info on the sync_update hw when I send the next version. On 09/07/2022 17:02, Uwe Kleine-König wrote: > Hello Conor, > > On Fri, Jul 08, 2022 at 03:39:22PM +0100, Conor Dooley wrote: >> Add a driver that supports the Microchip FPGA "soft" PWM IP core. >> >> Signed-off-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx> >> --- >> +static int mchp_core_pwm_apply_period(struct pwm_chip *chip, const struct pwm_state *state, >> + u8 *prescale, u8 *period_steps) >> +{ >> + struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip); >> + u64 tmp, clk_rate; >> + u16 prescale_val, period_steps_val; >> + >> + /* >> + * Calculate the period cycles and prescale values. >> + * The registers are each 8 bits wide & multiplied to compute the period >> + * using the formula: >> + * (clock_period) * (prescale + 1) * (period_steps + 1) >> + * so the maximum period that can be generated is 0x10000 times the >> + * period of the input clock. >> + * However, due to the design of the "hardware", it is not possible to >> + * attain a 100% duty cycle if the full range of period_steps is used. >> + * Therefore period_steps is restricted to 0xFE and the maximum multiple >> + * of the clock period attainable is 0xFF00. >> + */ >> + clk_rate = clk_get_rate(mchp_core_pwm->clk); > > + /* If clk_rate is too big, the following multiplication might overflow */ I expanded this comment slightly to: /* * If clk_rate is too big, the following multiplication might overflow. * However this is implausible, as the fabric of current FPGAs cannot * provide clocks at a rate high enough. */ >> + if (clk_rate >= NSEC_PER_SEC) >> + return -EINVAL; >> + >> + tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC); >> + >> + if (tmp >= MCHPCOREPWM_PERIOD_MAX) { >> + *prescale = MCHPCOREPWM_PRESCALE_MAX - 1; >> + *period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX - 1; >> + goto write_registers; >> + } >> + >> + for (prescale_val = 1; prescale_val <= MCHPCOREPWM_PRESCALE_MAX; prescale_val++) { >> + period_steps_val = div_u64(tmp, prescale_val); >> + if (period_steps_val > MCHPCOREPWM_PERIOD_STEPS_MAX) >> + continue; >> + *period_steps = period_steps_val - 1; >> + *prescale = prescale_val - 1; >> + break; >> + } > > OK, so you want to find the smallest prescale_val such that > > prescale_val * MCHPCOREPWM_PERIOD_STEPS_MAX >= tmp > > . You can calculate that without a loop as: > > prescale_val = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX); Hmm, not quite right. For tmp < MCHPCOREPWM_PERIOD_STEPS_MAX this gives zero. It would have to be: *prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX); In which case, the loop collapses to: *prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX); /* PREG_TO_VAL() can produce a value larger than UINT8_MAX */ *period_steps = div_u64(tmp, PREG_TO_VAL((u32) *prescale)) - 1; and the interim _val variables can just be deleted. >> +static int mchp_core_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, >> + const struct pwm_state *state) >> +{ >> + struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip); >> + struct pwm_state current_state = pwm->state; >> + bool period_locked; >> + u64 period; >> + u16 channel_enabled; >> + u8 prescale, period_steps; >> + int ret; >> + >> + if (!state->enabled) { >> + mchp_core_pwm_enable(chip, pwm, false); >> + return 0; >> + } >> + >> + /* >> + * If the only thing that has changed is the duty cycle or the polarity, >> + * we can shortcut the calculations and just compute/apply the new duty >> + * cycle pos & neg edges >> + * As all the channels share the same period, do not allow it to be >> + * changed if any other channels are enabled. >> + */ >> + spin_lock(&mchp_core_pwm->lock); >> + >> + channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) | >> + readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0))); >> + period_locked = channel_enabled & ~(1 << pwm->hwpwm); >> + >> + if ((!current_state.enabled || current_state.period != state->period) && !period_locked) { >> + ret = mchp_core_pwm_apply_period(chip, state, &prescale, &period_steps); >> + if (ret) { >> + spin_unlock(&mchp_core_pwm->lock); >> + return ret; >> + } >> + } else { >> + prescale = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE); >> + period_steps = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PERIOD); >> + } >> + >> + /* >> + * If the period is locked, it may not be possible to use a period less >> + * than that requested. >> + */ >> + period = PREG_TO_VAL(period_steps) * PREG_TO_VAL(prescale) * NSEC_PER_SEC; > > s/ / / > >> + do_div(period, clk_get_rate(mchp_core_pwm->clk)); >> + if (period > state->period) { >> + spin_unlock(&mchp_core_pwm->lock); >> + return -EINVAL; >> + } > > I would consider it easier to do the following (in pseudo syntax): > > > prescale, period_steps = calculate_hwperiod(period); > > if (period_locked): > hwprescale = readb_relaxed(PRESCALE) > hwperiod_steps = readb_relaxed(PERIOD) > > if period_steps * prescale < hwperiod_steps * hwprescale: > return -EINVAL > else > prescale, period_steps = hwprescale, > hwperiod_steps I think I like this method more than messing around with the clks. I'll change to something like this for the next version. > duty_steps = calculate_hwduty(duty, prescale) > if (duty_steps > period_steps) > duty_steps = period_steps > >> + mchp_core_pwm_apply_duty(chip, pwm, state, prescale, period_steps); >> + >> + /* >> + * Notify the block to update the waveform from the shadow registers. >> + * The updated values will not appear on the bus until they have been >> + * applied to the waveform at the beginning of the next period. We must >> + * write these registers and wait for them to be applied before calling >> + * enable(). >> + */ >> + if (mchp_core_pwm->sync_update_mask & (1 << pwm->hwpwm)) { >> + writel_relaxed(1U, mchp_core_pwm->base + MCHPCOREPWM_SYNC_UPD); >> + usleep_range(state->period, state->period * 2); >> + } >> + >> + spin_unlock(&mchp_core_pwm->lock); >> + >> + mchp_core_pwm_enable(chip, pwm, true); > > I already asked in the last round: Do you really need to write the > SYNC_UPD register twice? I would expect that you don't?! > > Also the locking looks fishy here. It would be simpler (and maybe even > more robust, didn't think deeply about it) to assume in > mchp_core_pwm_enable() that the caller holds the lock. Then you only > grab the lock once during .apply() and nothing strange can happen in the > gap. I got it into my head that enable() could be called by the framework. I'll simplify the locking here. > I'd take the lock here to be sure to get a consistent return value. > >> + channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) | >> + readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0))); > > micro optimisation: You're reading two register values here, but only use > one. Shadowing the enabled registers in mchp_core_pwm might also be an > idea. I'd agree, but more from the perspective of how awful I feel this code looks. > >> + if (channel_enabled & 1 << pwm->hwpwm) > > I'm always unsure about the associativity of & and <<, so I would have > written that as > > if (channel_enabled & (1 << pwm->hwpwm)) > > I just tested that for the umpteens time and your code is fine, so this > is only for human readers like me. I'll change it, I'll prob have forgotten the associativity by the time I look at the driver next. > >> + state->enabled = true; >> + else >> + state->enabled = false; >> + >> + prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE)); >> + >> + posedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_POSEDGE(pwm->hwpwm)); >> + negedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_NEGEDGE(pwm->hwpwm)); >> + >> + duty_steps = abs((s16)posedge - (s16)negedge); > > If duty_steps == 0 the returned result is wrong. I suggest to fix that, > at least mention the problem in a comment. I think handling it is the way to go. > >> + state->duty_cycle = duty_steps * prescale * NSEC_PER_SEC; > > Can this overflow? No, 255 * 256 * 1E9 < 2^64 but ... >> + prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE)); ... can. >>> What is the typical return value of clk_get_rate(mchp_core_pwm->clk)? >> >> It's gonna be less than 600M > > An exact value would be interesting, then when I spot a rounding problem > I could give you a test case to double check. The maximum depends on speed grade, but no more than 200 MHz. >>>> You need to round up here. Did you test your driver with PWM_DEBUG on? >>>> During test please do for a few fixed periods: >>>> >>>> for duty_cycle in [0 .. period]: >>>> pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...}) >>>> >>>> for duty_cycle in [period .. 0]: >>>> pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...}) >>>> >>>> and check there is no output claiming a miscalculation. >>> >>> I ran the stuff you gave me last time, doing something similar w/ a >>> shell loop. Got no reported miscalculations. >> >> I'm surprise, I would have expected that my test recipe would find such >> an issue. Could you follow my arguing about the rounding direction? >> There always the possibility that I'm wrong, too. > > I'll take another look & get back to you. I *think* I might've spelt "PWM_DEBUG" as "PWM_DEBUF"... I'll retest! Thanks, Conor.