Hi, On 2024-10-24 10:15, Bartosz Golaszewski wrote:
On Thu, 24 Oct 2024 08:49:30 +0200, Bartosz Golaszewski <brgl@xxxxxxxx> said:On Wed, Oct 23, 2024 at 11:05 PM Mark Brown <broonie@xxxxxxxxxx> wrote:On Fri, Oct 18, 2024 at 11:10:16AM +0200, Bartosz Golaszewski wrote:From: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx> We currently only notify user-space about line config changes that are made from user-space. Any kernel config changes are not signalled. Let's improve the situation by emitting the events closer to the source. To that end let's call the relevant notifier chain from the functions setting direction, gpiod_set_config(), gpiod_set_consumer_name() and gpiod_toggle_active_low(). This covers all the options that we can inform the user-space about. We ignore events which don't have corresponding flags exported to user-space on purpose - otherwise the user would see a config-changed event but the associated line-info would remain unchanged.Today's -next is not booting on several of my platforms, including beaglebone-black, i.MX8MP-EVK and pine64plus. Bisects are pointing at this commit, and i.MX8MP-EVK is actually giving some console output: [ 2.502208] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 2.511036] Mem abort info: ... [ 2.679934] Call trace: [ 2.682379] gpiod_direction_output+0x34/0x5c [ 2.686745] i2c_register_adapter+0x59c/0x670 [ 2.691111] __i2c_add_numbered_adapter+0x58/0xa8 [ 2.695822] i2c_add_adapter+0xa0/0xd0 [ 2.699578] i2c_add_numbered_adapter+0x2c/0x38 [ 2.704117] i2c_imx_probe+0x2d0/0x640 which looks plausible given the change. Full boot log for i.MX8MP-EVK: https://lava.sirena.org.uk/scheduler/job/887083 Bisect log for that, the others look similar (the long run of good/bad tags at the start for random commits is my automation pulling test results it already knows about in the affected range to try to speed up the bisect):Hi Mark! Any chance you could post the output of scripts/faddr2line drivers/gpio/gpiolib.o gpiod_direction_output+0x34/0x5c for that build? BartWhile I can't really reproduce it, I've looked at what could be wrong and figured that the NULL-pointer in question can possibly be the line_state_notifier. I realized that for some historical reasons we add the GPIO device to the global list before it's fully set up - including initializing the notifier. In some corner cases (devlinks borked?) this could lead to consumers requesting GPIOs before their provider is ready. Mark: could you try the following diff and let me know if it works? diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index ae758ba6dc3d..4258acac0162 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -987,45 +987,6 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, gdev->ngpio = gc->ngpio; gdev->can_sleep = gc->can_sleep; - - scoped_guard(mutex, &gpio_devices_lock) { - /* - * TODO: this allocates a Linux GPIO number base in the global - * GPIO numberspace for this chip. In the long run we want to - * get *rid* of this numberspace and use only descriptors, but - * it may be a pipe dream. It will not happen before we get rid - * of the sysfs interface anyways. - */ - base = gc->base; - if (base < 0) { - base = gpiochip_find_base_unlocked(gc->ngpio); - if (base < 0) { - ret = base; - base = 0; - goto err_free_label; - } - - /* - * TODO: it should not be necessary to reflect the - * assigned base outside of the GPIO subsystem. Go over - * drivers and see if anyone makes use of this, else - * drop this and assign a poison instead. - */ - gc->base = base; - } else { - dev_warn(&gdev->dev, - "Static allocation of GPIO base is deprecated, use dynamic allocation.\n"); - } - - gdev->base = base; - - ret = gpiodev_add_to_list_unlocked(gdev); - if (ret) { - chip_err(gc, "GPIO integer space overlap, cannot add chip\n"); - goto err_free_label; - } - } - ATOMIC_INIT_NOTIFIER_HEAD(&gdev->line_state_notifier); BLOCKING_INIT_NOTIFIER_HEAD(&gdev->device_notifier); @@ -1103,6 +1064,45 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, if (ret) goto err_remove_irqchip; } + + scoped_guard(mutex, &gpio_devices_lock) { + /* + * TODO: this allocates a Linux GPIO number base in the global + * GPIO numberspace for this chip. In the long run we want to + * get *rid* of this numberspace and use only descriptors, but + * it may be a pipe dream. It will not happen before we get rid + * of the sysfs interface anyways. + */ + base = gc->base; + if (base < 0) { + base = gpiochip_find_base_unlocked(gc->ngpio); + if (base < 0) { + ret = base; + base = 0; + goto err_free_label; + } + + /* + * TODO: it should not be necessary to reflect the + * assigned base outside of the GPIO subsystem. Go over + * drivers and see if anyone makes use of this, else + * drop this and assign a poison instead. + */ + gc->base = base; + } else { + dev_warn(&gdev->dev, + "Static allocation of GPIO base is deprecated, use dynamic allocation.\n"); + } + + gdev->base = base; + + ret = gpiodev_add_to_list_unlocked(gdev); + if (ret) { + chip_err(gc, "GPIO integer space overlap, cannot add chip\n"); + goto err_free_label; + } + } + return 0; err_remove_irqchip: Thanks Bartosz
I think I hit the same, or a similar bug, on my Edgerouter 6P (Cavium Octeon III):
CPU 3 Unable to handle kernel paging request at virtual address 0000000000000000, epc == ffffffff817a40c8, ra == ffffffff817a40a4
Oops[#1]:CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc4-next-20241023-00032-g01c8e35f8d89 #84
... Call Trace:gpiod_direction_output (drivers/gpio/gpiolib.c:4164 drivers/gpio/gpiolib.c:2700 drivers/gpio/gpiolib.c:2694) i2c_register_adapter (drivers/i2c/i2c-core-base.c:396 drivers/i2c/i2c-core-base.c:422 drivers/i2c/i2c-core-base.c:434 drivers/i2c/i2c-core-base.c:1574)
octeon_i2c_probe (drivers/i2c/busses/i2c-octeon-platdrv.c:247) (the complete log is attached)Unfortunately the oops remains after applying the diff and the call trace looks to be the same.
Please let me know if there's anything else you need. Regards, Klara Modin
Attachment:
gpiod_oops_decoded.gz
Description: application/gzip
# bad: [30226ad165626fa1d00945758ecafcfbdb47aed0] sched/ext: fix fmt fmt__str inconsistency git bisect start 'HEAD' # status: waiting for good commit(s), bad commit known # good: [c2ee9f594da826bea183ed14f2cc029c719bf4da] KVM: selftests: Fix build on on non-x86 architectures git bisect good c2ee9f594da826bea183ed14f2cc029c719bf4da # good: [1f64fb9c2040c018d0e045b785b3d11a3bc0bf61] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git git bisect good 1f64fb9c2040c018d0e045b785b3d11a3bc0bf61 # good: [d72293fcdadf4ca9fe265c9057e2d4e4b8c3fa7f] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git git bisect good d72293fcdadf4ca9fe265c9057e2d4e4b8c3fa7f # good: [d9e7d56ac23ba13e1255fa114aa1b90993386a40] Merge branch 'char-misc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git git bisect good d9e7d56ac23ba13e1255fa114aa1b90993386a40 # bad: [a37d9a1b19767866febad59765806042bc49ad7c] Merge branch 'gpio/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git git bisect bad a37d9a1b19767866febad59765806042bc49ad7c # good: [1096ccc17707eb50f677a6457bf65527bdf13d51] Merge branch 'staging-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git git bisect good 1096ccc17707eb50f677a6457bf65527bdf13d51 # good: [c6b673dd67833f12a52eedb2d7bb2429d7d95a8d] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git git bisect good c6b673dd67833f12a52eedb2d7bb2429d7d95a8d # good: [3bd13ae04ccc20e3a312596f89a269b8b6416dca] gpio: menz127: simplify error path and remove remove() git bisect good 3bd13ae04ccc20e3a312596f89a269b8b6416dca # bad: [3aba8402910bfab998d5cf6c84744de5db466936] gpio: grgpio: remove remove() git bisect bad 3aba8402910bfab998d5cf6c84744de5db466936 # good: [81625f362497509526a2f9ac53843ae30b4709cc] gpio: cdev: go back to storing debounce period in the GPIO descriptor git bisect good 81625f362497509526a2f9ac53843ae30b4709cc # good: [fcc8b637c542d1a0c19e5797ad72f9258e10464c] gpiolib: switch the line state notifier to atomic git bisect good fcc8b637c542d1a0c19e5797ad72f9258e10464c # bad: [bc40668def384256673bc18296865869c4a4187b] gpio: grgpio: drop Kconfig dependency on OF_GPIO git bisect bad bc40668def384256673bc18296865869c4a4187b # bad: [07c61d4da43fa3b34c152b28010d20be115a96db] gpiolib: notify user-space about in-kernel line state changes git bisect bad 07c61d4da43fa3b34c152b28010d20be115a96db # first bad commit: [07c61d4da43fa3b34c152b28010d20be115a96db] gpiolib: notify user-space about in-kernel line state changes
Attachment:
gpiod_oops_after_patch_decoded.gz
Description: application/gzip