On Tue, 2 Apr 2024 at 15:54, Andy Shevchenko <andy@xxxxxxxxxx> wrote: > > On Tue, Apr 02, 2024 at 03:41:00PM +0200, Hans de Goede wrote: > > On 3/29/24 4:16 PM, Bartosz Golaszewski wrote: > > > On Fri, 29 Mar 2024 15:11:21 +0100, Hans de Goede <hans@xxxxxxxxx> said: > > ... > > > > Thanks for the report. I hope I'm not being naive here but would the following > > > one-liner work? > > > > > > diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c > > > index ce94e37bcbee..69f365ccbfd8 100644 > > > --- a/drivers/gpio/gpiolib.c > > > +++ b/drivers/gpio/gpiolib.c > > > @@ -1179,7 +1179,7 @@ struct gpio_device *gpio_device_find(const void *data, > > > > > > gc = srcu_dereference(gdev->chip, &gdev->srcu); > > > > > > - if (gc && match(gc, data)) > > > + if (device_is_registered(&gdev->dev) && gc && match(gc, data)) > > > return gpio_device_get(gdev); > > > } > > > > > > This would make gpio_device_find() ignore any GPIO device that's not yet > > > registered on the GPIO bus which is almost the last step of the registration > > > process right before creating the sysfs attributes. > > > > Yes that should work and it has the added advantage that it also waits > > for things like the irqchip to be setup before gpio_device_find() will > > find the gpio-device. > > > > I cannot trigger the race every boot, but I do hit it quite regularly > > and with this change I've done 10 successful consecutive boots, so > > I believe that this indeed fixes the race. > > > > I've prepared a patch with this fix now which I'll send out shortly. > > > > As for Andy's suggestion I'm not all that familiar with the RCU stuff, > > but I think that if we were to go that route then the device_is_registered() > > check should be moved up to above the "guard(srcu)(&gdev->srcu);" > > line rather then above the "gc = srcu_deref..." line, since in that > > case we are not using the gdev->chip pointer at all if we bail ? > > I believe you are right and we need to move this check out of SRCU scope. > (FWIW, I also thought the very same way after I had sent the message and > was hesitating to reply with that.) > Yes, there's no reason to have it inside the SRCU read lock section. It doesn't protect the kobject internals. The variable checked in device_is_registered() is a single bit. Once we see it as set, we can assume the device is registered with its subsystem. Bart