On Fri, Nov 25, 2022 at 10:03:06PM +0100, Bartosz Golaszewski wrote: > On Fri, Nov 25, 2022 at 6:56 PM Andy Shevchenko > <andriy.shevchenko@xxxxxxxxxxxxxxx> wrote: > > > > On Fri, Nov 25, 2022 at 05:48:02PM +0100, Bartosz Golaszewski wrote: > > > On Fri, Nov 25, 2022 at 5:24 PM Kent Gibson <warthog618@xxxxxxxxx> wrote: > > > > ... > > > > > Then at the subsystem level, the GPIO device struct would need a lock > > > that would be taken by every user-space operation AND the code > > > unregistering the device so that we don't do what you described (i.e. > > > if there's a thread doing a read(), then let's wait until it returns > > > before we drop the device). > > > > It's called a reference counting, basically you need to get device and then > > put when it makes sense. > > > > Andy: I am aware of struct device reference counting but this isn't > it. You can count references all you want, but when I disconnect my > CP2112, the USB bus calls gpiochip_remove(), struct gpio_chip * inside > struct gpio_device is set to NULL and while the underlying struct > device itself is still alive, the GPIO chip is no longer usable. > > Reference counting won't help because the device is no longer there, > so this behavior is correct but there's an issue with user-space still > being able to hold certain resources and we need to make sure that > when it tries to use them, we return an error instead of crashing. > > I think that a good solution is to make sure, we cannot set gdev->gc > to NULL as long as there are user-space operations in progress. After > all, it's better to try to send a USB request to an unplugged device > than to dereference a NULL pointer. To that end, we could have a > user-space lock that would also be taken by gpiochip_remove(). > This is basically the answer I was hoping for - that there is some barrier in place to prevent chip removal while an ioctl is active. Then the check makes total sense - it is ensuring that the chip wasn't removed before the ioctl began and the barrier went up. On the other end, the caller of gpiochip_remove() needs to be prepared to gracefully fail calls on the chip until gpiochip_remove() returns. You would hope that is already the case... > But this is still a per-subsystem solution. Most other subsystems > suffer from the same issue. > Does that prevent us addressing the problem in gpio until a more general solution comes along? Anyway, I'm basically ok with your patch as a first step, as it greatly reduces the chances of triggering the fault, but it is only a band-aid over a larger issue and a more complete solution would be preferable. Without that, highlight in the checkin comment that it is not a complete fix. Cheers, Kent.