Re: [PATCH v5 3/4] pwm: add microchip soft ip corePWM driver

<Conor.Dooley@xxxxxxxxxxxxx> · Tue, 12 Jul 2022 10:57:25 +0000

On 11/07/2022 15:33, Conor Dooley wrote:
> Hey Uwe, took another look at this today.
> I'll give you some more info on the sync_update hw when I send
> the next version.

Ehh, I don't think that'll be needed - I had misconfigured my
devicetree while testing that change & that resulted in only one
of the writes to sync_update actually happening.
That just happened to be the one that didn't sleep, so the fact
that I used a spinlock for my lock didn't cause any problems.

Will switch to a mutex and remove one of the writes to the sync
update register...

/facepalm,
Conor.

> 
> On 09/07/2022 17:02, Uwe Kleine-König wrote:
>> Hello Conor,
>>
>> On Fri, Jul 08, 2022 at 03:39:22PM +0100, Conor Dooley wrote:
>>> Add a driver that supports the Microchip FPGA "soft" PWM IP core.
>>>
>>> Signed-off-by: Conor Dooley <conor.dooley@xxxxxxxxxxxxx>
>>> ---
> 
>>> +static int mchp_core_pwm_apply_period(struct pwm_chip *chip, const struct pwm_state *state,
>>> +                      u8 *prescale, u8 *period_steps)
>>> +{
>>> +    struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
>>> +    u64 tmp, clk_rate;
>>> +    u16 prescale_val, period_steps_val;
>>> +
>>> +    /*
>>> +     * Calculate the period cycles and prescale values.
>>> +     * The registers are each 8 bits wide & multiplied to compute the period
>>> +     * using the formula:
>>> +     * (clock_period) * (prescale + 1) * (period_steps + 1)
>>> +     * so the maximum period that can be generated is 0x10000 times the
>>> +     * period of the input clock.
>>> +     * However, due to the design of the "hardware", it is not possible to
>>> +     * attain a 100% duty cycle if the full range of period_steps is used.
>>> +     * Therefore period_steps is restricted to 0xFE and the maximum multiple
>>> +     * of the clock period attainable is 0xFF00.
>>> +     */
>>> +    clk_rate = clk_get_rate(mchp_core_pwm->clk);
>>
>> +    /* If clk_rate is too big, the following multiplication might overflow */
> 
> 
> I expanded this comment slightly to:
> /*
> * If clk_rate is too big, the following multiplication might overflow.
> * However this is implausible, as the fabric of current FPGAs cannot
> * provide clocks at a rate high enough.
> */
> 
>>> +    if (clk_rate >= NSEC_PER_SEC)
>>> +        return -EINVAL;
>>> +
>>> +    tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC);
>>> +
>>> +    if (tmp >= MCHPCOREPWM_PERIOD_MAX) {
>>> +        *prescale = MCHPCOREPWM_PRESCALE_MAX - 1;
>>> +        *period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX - 1;
>>> +        goto write_registers;
>>> +    }
>>> +
>>> +    for (prescale_val = 1; prescale_val <= MCHPCOREPWM_PRESCALE_MAX; prescale_val++) {
>>> +        period_steps_val = div_u64(tmp, prescale_val);
>>> +        if (period_steps_val > MCHPCOREPWM_PERIOD_STEPS_MAX)
>>> +            continue;
>>> +        *period_steps = period_steps_val - 1;
>>> +        *prescale = prescale_val - 1;
>>> +        break;
>>> +    }
>>
>> OK, so you want to find the smallest prescale_val such that
>>
>>     prescale_val * MCHPCOREPWM_PERIOD_STEPS_MAX >= tmp
>>
>> . You can calculate that without a loop as:
>>
>>     prescale_val = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX);
> 
> Hmm, not quite right. For tmp < MCHPCOREPWM_PERIOD_STEPS_MAX this gives
> zero. It would have to be:
> *prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX);
> 
> In which case, the loop collapses to:
> 
> *prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX);
> /* PREG_TO_VAL() can produce a value larger than UINT8_MAX */
> *period_steps = div_u64(tmp, PREG_TO_VAL((u32) *prescale)) - 1;
> 
> and the interim _val variables can just be deleted.
> 
> 
>>> +static int mchp_core_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
>>> +                   const struct pwm_state *state)
>>> +{
>>> +    struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
>>> +    struct pwm_state current_state = pwm->state;
>>> +    bool period_locked;
>>> +    u64 period;
>>> +    u16 channel_enabled;
>>> +    u8 prescale, period_steps;
>>> +    int ret;
>>> +
>>> +    if (!state->enabled) {
>>> +        mchp_core_pwm_enable(chip, pwm, false);
>>> +        return 0;
>>> +    }
>>> +
>>> +    /*
>>> +     * If the only thing that has changed is the duty cycle or the polarity,
>>> +     * we can shortcut the calculations and just compute/apply the new duty
>>> +     * cycle pos & neg edges
>>> +     * As all the channels share the same period, do not allow it to be
>>> +     * changed if any other channels are enabled.
>>> +     */
>>> +    spin_lock(&mchp_core_pwm->lock);
>>> +
>>> +    channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) |
>>> +        readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0)));
>>> +    period_locked = channel_enabled & ~(1 << pwm->hwpwm);
>>> +
>>> +    if ((!current_state.enabled || current_state.period != state->period) && !period_locked) {
>>> +        ret = mchp_core_pwm_apply_period(chip, state, &prescale, &period_steps);
>>> +        if (ret) {
>>> +            spin_unlock(&mchp_core_pwm->lock);
>>> +            return ret;
>>> +        }
>>> +    } else {
>>> +        prescale = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE);
>>> +        period_steps = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PERIOD);
>>> +    }
>>> +
>>> +    /*
>>> +     * If the period is locked, it may not be possible to use a period less
>>> +     * than that requested.
>>> +     */
>>> +    period = PREG_TO_VAL(period_steps) *  PREG_TO_VAL(prescale) * NSEC_PER_SEC;
>>
>> s/  / /
>>
>>> +    do_div(period, clk_get_rate(mchp_core_pwm->clk));
>>> +    if (period > state->period) {
>>> +        spin_unlock(&mchp_core_pwm->lock);
>>> +        return -EINVAL;
>>> +    }
>>
>> I would consider it easier to do the following (in pseudo syntax):
>>
>>
>>     prescale, period_steps = calculate_hwperiod(period);
>>
>>     if (period_locked):
>>         hwprescale = readb_relaxed(PRESCALE)
>>         hwperiod_steps = readb_relaxed(PERIOD)
>>
>>         if period_steps * prescale < hwperiod_steps * hwprescale:
>>             return -EINVAL
>>         else
>>             prescale, period_steps = hwprescale,
>>             hwperiod_steps
> 
> I think I like this method more than messing around with the clks.
> I'll change to something like this for the next version.
> 
>>     duty_steps = calculate_hwduty(duty, prescale)
>>     if (duty_steps > period_steps)
>>         duty_steps = period_steps
> 
> 
>>
>>> +    mchp_core_pwm_apply_duty(chip, pwm, state, prescale, period_steps);
>>> +
>>> +    /*
>>> +     * Notify the block to update the waveform from the shadow registers.
>>> +     * The updated values will not appear on the bus until they have been
>>> +     * applied to the waveform at the beginning of the next period. We must
>>> +     * write these registers and wait for them to be applied before calling
>>> +     * enable().
>>> +     */
>>> +    if (mchp_core_pwm->sync_update_mask & (1 << pwm->hwpwm)) {
>>> +        writel_relaxed(1U, mchp_core_pwm->base + MCHPCOREPWM_SYNC_UPD);
>>> +        usleep_range(state->period, state->period * 2);
>>> +    }
>>> +
>>> +    spin_unlock(&mchp_core_pwm->lock);
>>> +
>>> +    mchp_core_pwm_enable(chip, pwm, true);
>>
>> I already asked in the last round: Do you really need to write the
>> SYNC_UPD register twice? I would expect that you don't?!
>>
>> Also the locking looks fishy here. It would be simpler (and maybe even
>> more robust, didn't think deeply about it) to assume in
>> mchp_core_pwm_enable() that the caller holds the lock. Then you only
>> grab the lock once during .apply() and nothing strange can happen in the
>> gap.
> 
> I got it into my head that enable() could be called by the framework.
> I'll simplify the locking here.
> 
>> I'd take the lock here to be sure to get a consistent return value.
>>
>>> +    channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) |
>>> +        readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0)));
>>
>> micro optimisation: You're reading two register values here, but only use
>> one. Shadowing the enabled registers in mchp_core_pwm might also be an
>> idea.
> 
> I'd agree, but more from the perspective of how awful I feel this code
> looks.
> 
>>
>>> +    if (channel_enabled & 1 << pwm->hwpwm)
>>
>> I'm always unsure about the associativity of & and <<, so I would have
>> written that as
>>
>>     if (channel_enabled & (1 << pwm->hwpwm))
>>
>> I just tested that for the umpteens time and your code is fine, so this
>> is only for human readers like me.
> 
> I'll change it, I'll prob have forgotten the associativity by the time I
> look at the driver next.
> 
>>
>>> +        state->enabled = true;
>>> +    else
>>> +        state->enabled = false;
>>> +
>>> +    prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE));
>>> +
>>> +    posedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_POSEDGE(pwm->hwpwm));
>>> +    negedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_NEGEDGE(pwm->hwpwm));
>>> +
>>> +    duty_steps = abs((s16)posedge - (s16)negedge);
>>
>> If duty_steps == 0 the returned result is wrong. I suggest to fix that,
>> at least mention the problem in a comment.
> 
> I think handling it is the way to go.
> 
>>
>>> +    state->duty_cycle = duty_steps * prescale * NSEC_PER_SEC;
>>
>> Can this overflow?
> 
> No, 255 * 256 * 1E9 < 2^64 but ...
> 
>>> +    prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE));
> 
> ... can.
> 
>>>> What is the typical return value of clk_get_rate(mchp_core_pwm->clk)?
>>>
>>> It's gonna be less than 600M
>>
>> An exact value would be interesting, then when I spot a rounding problem
>> I could give you a test case to double check.
> 
> The maximum depends on speed grade, but no more than 200 MHz.
> 
> 
>>>>> You need to round up here. Did you test your driver with PWM_DEBUG on?
>>>>> During test please do for a few fixed periods:
>>>>>
>>>>>     for duty_cycle in [0 .. period]:
>>>>>         pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...})
>>>>>
>>>>>     for duty_cycle in [period .. 0]:
>>>>>         pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...})
>>>>>
>>>>> and check there is no output claiming a miscalculation.
>>>>
>>>> I ran the stuff you gave me last time, doing something similar w/ a
>>>> shell loop. Got no reported miscalculations.
>>>
>>> I'm surprise, I would have expected that my test recipe would find such
>>> an issue. Could you follow my arguing about the rounding direction?
>>> There always the possibility that I'm wrong, too.
>>
>> I'll take another look & get back to you.
> 
> I *think* I might've spelt "PWM_DEBUG" as "PWM_DEBUF"...
> 
> I'll retest!
> 
> Thanks,
> Conor.