Re: [PATCH v1] pinctrl: baytrail: Move IRQ valid mask initialization to a dedicated callback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 30-10-2019 13:42, Linus Walleij wrote:
On Wed, Oct 30, 2019 at 10:31 AM Hans de Goede <hdegoede@xxxxxxxxxx> wrote:

The problem here is that gpiochip_add_data_with_key() calls gpiochip_irqchip_init_hw()
before it calls gpiochip_irqchip_init_valid_mask(), so after commit 88583e340a0e
when byt_gpio_irq_init_hw runs gc->irq.valid_mask is NULL and we crash with a NULL
pointer exception (or so I believe, the kernel never gets far enough to get
any info out of it without extra work).

Note that this ("[PATCH v1] pinctrl: baytrail: Move IRQ valid mask initialization to a dedicated callback")
patch fixes this since it moves the gc->irq.valid_mask accesses to
byt_init_irq_valid_mask.

OK so we have a halfway fix there.

But this change itself triggers another variant of this ordering issue,
it causes these 2 new errors to get logged:

byt_gpio INT33FC:01: GPIO interrupt error, pins misconfigured. INT_STAT0: 0x01800000
byt_gpio INT33FC:02: GPIO interrupt error, pins misconfigured. INT_STAT0: 0x00400000

The problem is that before this change the code calculating the valid_mask
would also disable interrupts on GPIOs which do not have their
BYT_DIRECT_IRQ_EN bit set. This now happens after the check done in
byt_gpio_irq_init_hw() causing these false-positive error messages.

Isn't that easily fixed with something like this:

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 9afbc0612126..e865c889ba8d 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1411,11 +1411,11 @@ int gpiochip_add_data_with_key(struct
gpio_chip *chip, void *data,

         machine_gpiochip_add(chip);

-       ret = gpiochip_irqchip_init_hw(chip);
+       ret = gpiochip_irqchip_init_valid_mask(chip);
         if (ret)
                 goto err_remove_acpi_chip;

-       ret = gpiochip_irqchip_init_valid_mask(chip);
+       ret = gpiochip_irqchip_init_hw(chip);
         if (ret)
                 goto err_remove_acpi_chip;

(I sent a separate patch for this.)

Yes I just replied to that patch, this seems like a good fix
to me.

It isn't super-easy to know the right ordering semantics
for init_hw vs init_valid_mask I think. Sadly we need to
test it out in practice.

Ack.

Even if we ignore the NULL pointer deref problem for now and we ignore
these 2 new error messages for now. Things are still broken with the
current changes in pinctrl/intel.git/for-next switching to letting
devm_gpiochip_add_data register the irqchip means that
acpi_gpiochip_request_interrupts() gets called before
gpiochip_add_pin_range() is called from pinctrl-baytrail.c, causing
the GPIO lookup of any ACPI _AEI handlers to fail, resulting in
errors like this one:

byt_gpio INT33FC:02: Failed to request GPIO for pin 0x13: -517

And none of the _AEI handlers working

I just vaguely understand this...

If what you're saying is that the Baytrail driver is dependent
on registering the pin ranges *before* registering the GPIO
chip

Yes I think so, I debugged the _AEI handlers not working a bit
yesterday and the problem is that pinctrl_gpio_request() fails,
first pinctrl_get_device_gpio_range fails with -EPROBEDEFER (*)
and then pinctrl_match_gpio_range() also fails. I added some
debug pr_err calls to pinctrl_match_gpio_range() and when it runs
the range for the gpiochip for which acpi_gpiochip_request_interrupts()
is processing _AEI event-handlers is not yet in the registered
ranges.

*) Which is not treated specially by acpi_gpiochip_request_interrupts()
as that normally gets called from the gpiochip driver itself, so the
device is expected to alreayd be probed.


can we then:

diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c
b/drivers/pinctrl/intel/pinctrl-baytrail.c
index beb26550c25f..b308567c5153 100644
--- a/drivers/pinctrl/intel/pinctrl-baytrail.c
+++ b/drivers/pinctrl/intel/pinctrl-baytrail.c
@@ -1549,16 +1549,20 @@ static int byt_gpio_probe(struct byt_gpio *vg)
                 girq->handler = handle_bad_irq;
         }

-       ret = devm_gpiochip_add_data(&vg->pdev->dev, gc, vg);
+       /*
+        * Needs to happen first since the gpiochip is using pin
+        * control as back-end.
+        */
+       ret = gpiochip_add_pin_range(gc, dev_name(&vg->pdev->dev),
+                                    0, 0, vg->soc_data->npins);
         if (ret) {
-               dev_err(&vg->pdev->dev, "failed adding byt-gpio chip\n");
+               dev_err(&vg->pdev->dev, "failed to add GPIO pin range\n");
                 return ret;
         }

-       ret = gpiochip_add_pin_range(&vg->chip, dev_name(&vg->pdev->dev),
-                                    0, 0, vg->soc_data->npins);
+       ret = devm_gpiochip_add_data(&vg->pdev->dev, gc, vg);
         if (ret) {
-               dev_err(&vg->pdev->dev, "failed to add GPIO pin range\n");
+               dev_err(&vg->pdev->dev, "failed adding byt-gpio chip\n");
                 return ret;
         }

(Tell me if I should send this as a separate patch.)

If you want me to test if this fixes the issue, then yes please.

It's not entirely logical to have this semantic ordering so
the extra comment explains it, I hope, in case it actually
works.

TL;DR: commit 88583e340a0e ("pinctrl: intel: baytrail: Pass irqchip when adding gpiochip")
breaks a bunch of stuff and should be dropped from pinctrl/intel.git/for-next
and this needs some more work before it is ready for mainline.

I don't know if that is such a good idea if this is a global problem,
like something that would potentially disturb any ACPI-based
GPIO chip. We might leave something else broken even if we
fix the issue locally.

Right, I did a quick check and at least these x86 pinctrl drivers
all 3 have this ordering problem once the irq chip registration is
moved to the gpiochip_add_data() call.

drivers/pinctrl/intel/pinctrl-baytrail.c
drivers/pinctrl/intel/pinctrl-cherryview.c
drivers/pinctrl/intel/pinctrl-intel.c
drivers/pinctrl/pinctrl-amd.c

And it seems that drivers/gpio/gpio-merrifield.c is already
suffering from this problem in 5.4!

So some more generic solution would be ideal...

Regards,

Hans





[Index of Archives]     [Linux SPI]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux