Re: [PATCH 01/18] pwm: Provide devm_pwmchip_alloc() function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 19, 2023 at 09:59:29AM +0200, Thierry Reding wrote:
> On Tue, Jul 18, 2023 at 08:18:32PM +0200, Uwe Kleine-König wrote:
> > This function allocates a struct pwm_chip and driver data. Compared to
> > the status quo the split into pwm_chip and driver data is new, otherwise
> > it doesn't change anything relevant (yet).
> > 
> > The intention is that after all drivers are switched to use this
> > allocation function, its possible to add a struct device to struct
> > pwm_chip to properly track the latter's lifetime without touching all
> > drivers again. Proper lifetime tracking is a necessary precondition to
> > introduce character device support for PWMs (that implements atomic
> > setting and doesn't suffer from the sysfs overhead of the /sys/class/pwm
> > userspace support).
> > 
> > The new function pwmchip_priv() (obviously?) only works for chips
> > allocated with devm_pwmchip_alloc().
> 
> If this is supposed to be similar to the GPIO chardev, why doesn't GPIO
> require this way of allocating a struct gpio_chip? I'm not a fan of
> doing all this upfront work without seeing where this is ultimately
> headed. Please hold off on reworking everything until you have a
> complete proposal that can be reviewed in full.

I'm working on that and already have a patch stack with more than 100
patches, many of them are cleanups that I found while working on each
PWM driver (most of these are already posted to the linux-pwm list). The
biggest part is to convert each of the 50+ pwm drivers to
pwmchip_alloc().

Today I managed to trigger the problem I intend to address with this
series. My machine to test this on is an stm32mp157. To be able to
trigger the problem reliably I applied the following patches on top of
v6.5-rc1:

 - pwm: stm32: Don't modify HW state in .remove() callback
   This is a cleanup that I already sent out.
   https://lore.kernel.org/r/20230713155142.2454010-2-u.kleine-koenig@xxxxxxxxxxxxxx
   The purpose for reproducing the problem is to not trigger further
   calls to the apply callback.

 - The following patch:

diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c
index 687967d3265f..c7fc02b0fa3c 100644
--- a/drivers/pwm/pwm-stm32.c
+++ b/drivers/pwm/pwm-stm32.c
@@ -451,6 +451,10 @@ static int stm32_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
 	struct stm32_pwm *priv = to_stm32_pwm_dev(chip);
 	int ret;
 
+	dev_info(chip->dev, "%s:%d\n", __func__, __LINE__);
+	msleep(5000);
+	dev_info(chip->dev, "%s:%d\n", __func__, __LINE__);
+
 	enabled = pwm->state.enabled;
 
 	if (enabled && !state->enabled) {
@@ -650,7 +654,11 @@ static void stm32_pwm_remove(struct platform_device *pdev)
 {
 	struct stm32_pwm *priv = platform_get_drvdata(pdev);
 
+	dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__);
 	pwmchip_remove(&priv->chip);
+	dev_info(&pdev->dev, "%s:%d\n", __func__, __LINE__);
+
+	priv->regmap = NULL;
 }
 
 static int __maybe_unused stm32_pwm_suspend(struct device *dev)

The first hunk is only there to widen the race window. The second is to
give some diagnostics and make stm32_pwm_apply() crash if it continues
to run after the msleep. (Without it it didn't crash reproducibly, don't
understand why. *shrug*)

The device tree contains a pwm-fan device making use of one of the PWMs.

Now I do the following:

	echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind

Unbinding the fan device has two effects:

 - The device link between fan and pwm looses its property to unbind fan
   when pwm gets unbound.
   (Its .status changes from DL_STATE_ACTIVE to DL_STATE_AVAILABLE)
 - It calls pwm_fan_cleanup() which triggers a call to
   pwm_apply_state().

So when the pwm device gets unbound the first thread is sleeping in
stm32_pwm_apply(). The driver calls pwmchip_remove() and sets
priv->regmap to NULL. Then a few seconds later the first thread wakes up
in stm32_pwm_apply() with the chip freed and priv->regmap = NULL. Bang!

This looks as follows:

root@crown:~# echo fan > /sys/bus/platform/drivers/pwm-fan/unbind & sleep 1; echo 40007000.timer:pwm > /sys/bus/platform/drivers/stm32-pwm/unbind
[  187.182113] stm32-pwm 40007000.timer:pwm: stm32_pwm_apply:454
[  188.164769] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:657
[  188.184555] stm32-pwm 40007000.timer:pwm: stm32_pwm_remove:659
root@crown:~# [  192.236423] platform 40007000.timer:pwm: stm32_pwm_apply:456
[  192.240727] 8<--- cut here ---
[  192.243759] Unable to handle kernel NULL pointer dereference at virtual address 0000001c when read
...

Even without the crash you can see that stm32_pwm_apply() is still
running after pwmchip_remove() completed.

I'm unsure if the device link could be improved here to ensure that the
fan is completely unbound even if it started unbinding already before
the pwm device gets unbound. (And if it could, would this fit the device
links purpose and so be a sensible improvement?)

If not, the only possible other fix is to make sure that the pwm
callbacks are serialized with pwmchip_remove() and stop calling the
driver callbacks when pwmchip_remove() was called. For that the
pwm_chip struct must stay around until all consumers are really gone. So
the pwm_chip must not be allocated using devm_kzalloc for the pwm's
parent device.

Am I missing something? Is this good enough to convince you that we need
the concept of devm_pwmchip_alloc() from this thread?

My preferred way to continue fixing that would be to get all the
preparing cleanups into next soon to keep my patch stack smaller.
Another pre-condition to serializing the pwm callbacks is (I think) that
all low-level drivers must be fixed to not call pwm-API functions[1].

Then I'll prepare switching all drivers based on this series plus some
more patches to introduce a struct device in struct gpio_chip to track
the lifetime and eventually fix the issue demonstrated above.

While addressing this issue isn't a hard requirement for my final plan
to introduce /dev/pwmchip character devices, fixing it now before the
pwm core is complicated with the character device code should be easier
at least. (Also these character devices introduce another "user" of
pwm_chips and so another reason that these shouldn't be allocated using
devm_kzalloc.)

Best regards
Uwe

[1] first approximation:

	$ git grep -Ew 'pwm_(apply|get_state|is_enabled|set_period|get_period|[sg]et_duty_cycle|get_polarity|enable|disable)' v6.5-rc1 -- drivers/pwm/pwm-*.c | wc -l
	43

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux