Hello Saravana, you were pointed out to me as the expert for device links. I found a problem with these. On Tue, Jul 25, 2023 at 11:10:04PM +0200, Uwe Kleine-König wrote: > Today I managed to trigger the problem I intend to address with this > series. My machine to test this on is an stm32mp157. To be able to > trigger the problem reliably I applied the following patches on top of > v6.5-rc1: > > - pwm: stm32: Don't modify HW state in .remove() callback > This is a cleanup that I already sent out. > https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@xxxxxxxxxxxxxx > The purpose for reproducing the problem is to not trigger further > calls to the apply callback. > > - The following patch: > > diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c > index 687967d3265f..c7fc02b0fa3c 100644 > --- a/drivers/pwm/pwm-stm32.c > +++ b/drivers/pwm/pwm-stm32.c > @@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm, > struct stm32_pwm *priv = to_stm32_pwm_dev(chip); > int ret; > > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > + msleep(5000); > + dev_info(chip->dev, "%s:%d\n", __func__, __LINE__); > + > enabled = pwm->state.enabled; > > if (enabled && !state->enabled) { > @@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev) > { > struct stm32_pwm *priv = platform_get_drvdata(pdev); > > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > pwmchip_remove(&priv->chip); > + dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__); > + > + priv->regmap = NULL; > } > > static int __maybe_unused stm32_pwm_suspend(struct device *dev) > > The first hunk is only there to widen the race window. The second is to > give some diagnostics and make stm32_pwm_apply() crash if it continues > to run after the msleep. (Without it it didn't crash reproducibly, don't > understand why. *shrug*) > > The device tree contains a pwm-fan device making use of one of the PWMs. > > Now I do the following: > > echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > > Unbinding the fan device has two effects: > > - The device link between fan and pwm looses its property to unbind fan > when pwm gets unbound. > (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE) > - It calls pwm_fan_cleanup() which triggers a call to > pwm_apply_state(). > > So when the pwm device gets unbound the first thread is sleeping in > stm32_pwm_apply(). The driver calls pwmchip_remove() and sets > priv->regmap to NULL. Then a few seconds later the first thread wakes up > in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang! > > This looks as follows: > > root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind > [ 187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454 > [ 188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657 > [ 188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659 > root@crown:~# [ 192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456 > [ 192.240727] 8<--- cut here --- > [ 192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read > ... > > Even without the crash you can see that stm32_pwm_apply() is still > running after pwmchip_remove() completed. > > I'm unsure if the device link could be improved here to ensure that the > fan is completely unbound even if it started unbinding already before > the pwm device gets unbound. (And if it could, would this fit the device > links purpose and so be a sensible improvement?) While I think that there is something to be done in the pwm core that this doesn't explode (i.e. do proper lifetime tracking such that a pwm_chip doesn't disappear while still being used---and I'm working on that) I expected that the device links between pwm consumer and provider would prevent the above described oops, too. But somehow the fan already going away (but still using the PWM) when the PWM is unbound, results in the PWM disappearing before the fan is completely gone. Is this expected, or a problem that can (and should?) be fixed? If you need more context or a tester, don't hesitate to ask. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ |
Attachment:
signature.asc
Description: PGP signature