On 11/07/2022 15:33, Conor Dooley wrote: > Hey Uwe, took another look at this today. > I'll give you some more info on the sync_update hw when I send > the next version. Ehh, I don't think that'll be needed - I had misconfigured my devicetree while testing that change & that resulted in only one of the writes to sync_update actually happening. That just happened to be the one that didn't sleep, so the fact that I used a spinlock for my lock didn't cause any problems. Will switch to a mutex and remove one of the writes to the sync update register... /facepalm, Conor. > > On 09/07/2022 17:02, Uwe Kleine-König wrote: >> Hello Conor, >> >> On Fri, Jul 08, 2022 at 03:39:22PM +0100, Conor Dooley wrote: >>> Add a driver that supports the Microchip FPGA "soft" PWM IP core. >>> >>> Signed-off-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx> >>> --- > >>> +static int mchp_core_pwm_apply_period(struct pwm_chip *chip, const struct pwm_state *state, >>> + u8 *prescale, u8 *period_steps) >>> +{ >>> + struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip); >>> + u64 tmp, clk_rate; >>> + u16 prescale_val, period_steps_val; >>> + >>> + /* >>> + * Calculate the period cycles and prescale values. >>> + * The registers are each 8 bits wide & multiplied to compute the period >>> + * using the formula: >>> + * (clock_period) * (prescale + 1) * (period_steps + 1) >>> + * so the maximum period that can be generated is 0x10000 times the >>> + * period of the input clock. >>> + * However, due to the design of the "hardware", it is not possible to >>> + * attain a 100% duty cycle if the full range of period_steps is used. >>> + * Therefore period_steps is restricted to 0xFE and the maximum multiple >>> + * of the clock period attainable is 0xFF00. >>> + */ >>> + clk_rate = clk_get_rate(mchp_core_pwm->clk); >> >> + /* If clk_rate is too big, the following multiplication might overflow */ > > > I expanded this comment slightly to: > /* > * If clk_rate is too big, the following multiplication might overflow. > * However this is implausible, as the fabric of current FPGAs cannot > * provide clocks at a rate high enough. > */ > >>> + if (clk_rate >= NSEC_PER_SEC) >>> + return -EINVAL; >>> + >>> + tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC); >>> + >>> + if (tmp >= MCHPCOREPWM_PERIOD_MAX) { >>> + *prescale = MCHPCOREPWM_PRESCALE_MAX - 1; >>> + *period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX - 1; >>> + goto write_registers; >>> + } >>> + >>> + for (prescale_val = 1; prescale_val <= MCHPCOREPWM_PRESCALE_MAX; prescale_val++) { >>> + period_steps_val = div_u64(tmp, prescale_val); >>> + if (period_steps_val > MCHPCOREPWM_PERIOD_STEPS_MAX) >>> + continue; >>> + *period_steps = period_steps_val - 1; >>> + *prescale = prescale_val - 1; >>> + break; >>> + } >> >> OK, so you want to find the smallest prescale_val such that >> >> prescale_val * MCHPCOREPWM_PERIOD_STEPS_MAX >= tmp >> >> . You can calculate that without a loop as: >> >> prescale_val = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX); > > Hmm, not quite right. For tmp < MCHPCOREPWM_PERIOD_STEPS_MAX this gives > zero. It would have to be: > *prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX); > > In which case, the loop collapses to: > > *prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX); > /* PREG_TO_VAL() can produce a value larger than UINT8_MAX */ > *period_steps = div_u64(tmp, PREG_TO_VAL((u32) *prescale)) - 1; > > and the interim _val variables can just be deleted. > > >>> +static int mchp_core_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, >>> + const struct pwm_state *state) >>> +{ >>> + struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip); >>> + struct pwm_state current_state = pwm->state; >>> + bool period_locked; >>> + u64 period; >>> + u16 channel_enabled; >>> + u8 prescale, period_steps; >>> + int ret; >>> + >>> + if (!state->enabled) { >>> + mchp_core_pwm_enable(chip, pwm, false); >>> + return 0; >>> + } >>> + >>> + /* >>> + * If the only thing that has changed is the duty cycle or the polarity, >>> + * we can shortcut the calculations and just compute/apply the new duty >>> + * cycle pos & neg edges >>> + * As all the channels share the same period, do not allow it to be >>> + * changed if any other channels are enabled. >>> + */ >>> + spin_lock(&mchp_core_pwm->lock); >>> + >>> + channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) | >>> + readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0))); >>> + period_locked = channel_enabled & ~(1 << pwm->hwpwm); >>> + >>> + if ((!current_state.enabled || current_state.period != state->period) && !period_locked) { >>> + ret = mchp_core_pwm_apply_period(chip, state, &prescale, &period_steps); >>> + if (ret) { >>> + spin_unlock(&mchp_core_pwm->lock); >>> + return ret; >>> + } >>> + } else { >>> + prescale = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE); >>> + period_steps = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PERIOD); >>> + } >>> + >>> + /* >>> + * If the period is locked, it may not be possible to use a period less >>> + * than that requested. >>> + */ >>> + period = PREG_TO_VAL(period_steps) * PREG_TO_VAL(prescale) * NSEC_PER_SEC; >> >> s/ / / >> >>> + do_div(period, clk_get_rate(mchp_core_pwm->clk)); >>> + if (period > state->period) { >>> + spin_unlock(&mchp_core_pwm->lock); >>> + return -EINVAL; >>> + } >> >> I would consider it easier to do the following (in pseudo syntax): >> >> >> prescale, period_steps = calculate_hwperiod(period); >> >> if (period_locked): >> hwprescale = readb_relaxed(PRESCALE) >> hwperiod_steps = readb_relaxed(PERIOD) >> >> if period_steps * prescale < hwperiod_steps * hwprescale: >> return -EINVAL >> else >> prescale, period_steps = hwprescale, >> hwperiod_steps > > I think I like this method more than messing around with the clks. > I'll change to something like this for the next version. > >> duty_steps = calculate_hwduty(duty, prescale) >> if (duty_steps > period_steps) >> duty_steps = period_steps > > >> >>> + mchp_core_pwm_apply_duty(chip, pwm, state, prescale, period_steps); >>> + >>> + /* >>> + * Notify the block to update the waveform from the shadow registers. >>> + * The updated values will not appear on the bus until they have been >>> + * applied to the waveform at the beginning of the next period. We must >>> + * write these registers and wait for them to be applied before calling >>> + * enable(). >>> + */ >>> + if (mchp_core_pwm->sync_update_mask & (1 << pwm->hwpwm)) { >>> + writel_relaxed(1U, mchp_core_pwm->base + MCHPCOREPWM_SYNC_UPD); >>> + usleep_range(state->period, state->period * 2); >>> + } >>> + >>> + spin_unlock(&mchp_core_pwm->lock); >>> + >>> + mchp_core_pwm_enable(chip, pwm, true); >> >> I already asked in the last round: Do you really need to write the >> SYNC_UPD register twice? I would expect that you don't?! >> >> Also the locking looks fishy here. It would be simpler (and maybe even >> more robust, didn't think deeply about it) to assume in >> mchp_core_pwm_enable() that the caller holds the lock. Then you only >> grab the lock once during .apply() and nothing strange can happen in the >> gap. > > I got it into my head that enable() could be called by the framework. > I'll simplify the locking here. > >> I'd take the lock here to be sure to get a consistent return value. >> >>> + channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) | >>> + readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0))); >> >> micro optimisation: You're reading two register values here, but only use >> one. Shadowing the enabled registers in mchp_core_pwm might also be an >> idea. > > I'd agree, but more from the perspective of how awful I feel this code > looks. > >> >>> + if (channel_enabled & 1 << pwm->hwpwm) >> >> I'm always unsure about the associativity of & and <<, so I would have >> written that as >> >> if (channel_enabled & (1 << pwm->hwpwm)) >> >> I just tested that for the umpteens time and your code is fine, so this >> is only for human readers like me. > > I'll change it, I'll prob have forgotten the associativity by the time I > look at the driver next. > >> >>> + state->enabled = true; >>> + else >>> + state->enabled = false; >>> + >>> + prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE)); >>> + >>> + posedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_POSEDGE(pwm->hwpwm)); >>> + negedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_NEGEDGE(pwm->hwpwm)); >>> + >>> + duty_steps = abs((s16)posedge - (s16)negedge); >> >> If duty_steps == 0 the returned result is wrong. I suggest to fix that, >> at least mention the problem in a comment. > > I think handling it is the way to go. > >> >>> + state->duty_cycle = duty_steps * prescale * NSEC_PER_SEC; >> >> Can this overflow? > > No, 255 * 256 * 1E9 < 2^64 but ... > >>> + prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE)); > > ... can. > >>>> What is the typical return value of clk_get_rate(mchp_core_pwm->clk)? >>> >>> It's gonna be less than 600M >> >> An exact value would be interesting, then when I spot a rounding problem >> I could give you a test case to double check. > > The maximum depends on speed grade, but no more than 200 MHz. > > >>>>> You need to round up here. Did you test your driver with PWM_DEBUG on? >>>>> During test please do for a few fixed periods: >>>>> >>>>> for duty_cycle in [0 .. period]: >>>>> pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...}) >>>>> >>>>> for duty_cycle in [period .. 0]: >>>>> pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...}) >>>>> >>>>> and check there is no output claiming a miscalculation. >>>> >>>> I ran the stuff you gave me last time, doing something similar w/ a >>>> shell loop. Got no reported miscalculations. >>> >>> I'm surprise, I would have expected that my test recipe would find such >>> an issue. Could you follow my arguing about the rounding direction? >>> There always the possibility that I'm wrong, too. >> >> I'll take another look & get back to you. > > I *think* I might've spelt "PWM_DEBUG" as "PWM_DEBUF"... > > I'll retest! > > Thanks, > Conor.