On Thu, 24 Oct 2024 08:49:30 +0200, Bartosz Golaszewski <brgl@xxxxxxxx> said: > On Wed, Oct 23, 2024 at 11:05 PM Mark Brown <broonie@xxxxxxxxxx> wrote: >> >> On Fri, Oct 18, 2024 at 11:10:16AM +0200, Bartosz Golaszewski wrote: >> > From: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx> >> > >> > We currently only notify user-space about line config changes that are >> > made from user-space. Any kernel config changes are not signalled. >> > >> > Let's improve the situation by emitting the events closer to the source. >> > To that end let's call the relevant notifier chain from the functions >> > setting direction, gpiod_set_config(), gpiod_set_consumer_name() and >> > gpiod_toggle_active_low(). This covers all the options that we can >> > inform the user-space about. We ignore events which don't have >> > corresponding flags exported to user-space on purpose - otherwise the >> > user would see a config-changed event but the associated line-info would >> > remain unchanged. >> >> Today's -next is not booting on several of my platforms, including >> beaglebone-black, i.MX8MP-EVK and pine64plus. Bisects are pointing at >> this commit, and i.MX8MP-EVK is actually giving some console output: >> >> [ 2.502208] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 >> [ 2.511036] Mem abort info: >> >> ... >> >> [ 2.679934] Call trace: >> [ 2.682379] gpiod_direction_output+0x34/0x5c >> [ 2.686745] i2c_register_adapter+0x59c/0x670 >> [ 2.691111] __i2c_add_numbered_adapter+0x58/0xa8 >> [ 2.695822] i2c_add_adapter+0xa0/0xd0 >> [ 2.699578] i2c_add_numbered_adapter+0x2c/0x38 >> [ 2.704117] i2c_imx_probe+0x2d0/0x640 >> >> which looks plausible given the change. >> >> Full boot log for i.MX8MP-EVK: >> >> https://lava.sirena.org.uk/scheduler/job/887083 >> >> Bisect log for that, the others look similar (the long run of good/bad >> tags at the start for random commits is my automation pulling test >> results it already knows about in the affected range to try to speed up >> the bisect): >> > > Hi Mark! > > Any chance you could post the output of > > scripts/faddr2line drivers/gpio/gpiolib.o gpiod_direction_output+0x34/0x5c > > for that build? > > Bart > While I can't really reproduce it, I've looked at what could be wrong and figured that the NULL-pointer in question can possibly be the line_state_notifier. I realized that for some historical reasons we add the GPIO device to the global list before it's fully set up - including initializing the notifier. In some corner cases (devlinks borked?) this could lead to consumers requesting GPIOs before their provider is ready. Mark: could you try the following diff and let me know if it works? diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index ae758ba6dc3d..4258acac0162 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -987,45 +987,6 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, gdev->ngpio = gc->ngpio; gdev->can_sleep = gc->can_sleep; - - scoped_guard(mutex, &gpio_devices_lock) { - /* - * TODO: this allocates a Linux GPIO number base in the global - * GPIO numberspace for this chip. In the long run we want to - * get *rid* of this numberspace and use only descriptors, but - * it may be a pipe dream. It will not happen before we get rid - * of the sysfs interface anyways. - */ - base = gc->base; - if (base < 0) { - base = gpiochip_find_base_unlocked(gc->ngpio); - if (base < 0) { - ret = base; - base = 0; - goto err_free_label; - } - - /* - * TODO: it should not be necessary to reflect the - * assigned base outside of the GPIO subsystem. Go over - * drivers and see if anyone makes use of this, else - * drop this and assign a poison instead. - */ - gc->base = base; - } else { - dev_warn(&gdev->dev, - "Static allocation of GPIO base is deprecated, use dynamic allocation.\n"); - } - - gdev->base = base; - - ret = gpiodev_add_to_list_unlocked(gdev); - if (ret) { - chip_err(gc, "GPIO integer space overlap, cannot add chip\n"); - goto err_free_label; - } - } - ATOMIC_INIT_NOTIFIER_HEAD(&gdev->line_state_notifier); BLOCKING_INIT_NOTIFIER_HEAD(&gdev->device_notifier); @@ -1103,6 +1064,45 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, if (ret) goto err_remove_irqchip; } + + scoped_guard(mutex, &gpio_devices_lock) { + /* + * TODO: this allocates a Linux GPIO number base in the global + * GPIO numberspace for this chip. In the long run we want to + * get *rid* of this numberspace and use only descriptors, but + * it may be a pipe dream. It will not happen before we get rid + * of the sysfs interface anyways. + */ + base = gc->base; + if (base < 0) { + base = gpiochip_find_base_unlocked(gc->ngpio); + if (base < 0) { + ret = base; + base = 0; + goto err_free_label; + } + + /* + * TODO: it should not be necessary to reflect the + * assigned base outside of the GPIO subsystem. Go over + * drivers and see if anyone makes use of this, else + * drop this and assign a poison instead. + */ + gc->base = base; + } else { + dev_warn(&gdev->dev, + "Static allocation of GPIO base is deprecated, use dynamic allocation.\n"); + } + + gdev->base = base; + + ret = gpiodev_add_to_list_unlocked(gdev); + if (ret) { + chip_err(gc, "GPIO integer space overlap, cannot add chip\n"); + goto err_free_label; + } + } + return 0; err_remove_irqchip: Thanks Bartosz