On Fri, Nov 25, 2022 at 5:24 PM Kent Gibson <warthog618@xxxxxxxxx> wrote: > > On Fri, Nov 25, 2022 at 04:32:57PM +0100, Bartosz Golaszewski wrote: > > From: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx> > > > > There are several places where we can crash the kernel by requesting > > lines, unbinding the GPIO device, then calling any of the system calls > > relevant to the GPIO character device's annonymous file descriptors: > > ioctl(), read(), poll(). > > > > While I observed it with the GPIO simulator, it will also happen for any > > of the GPIO devices that can be hot-unplugged - for instance any HID GPIO > > expander (e.g. CP2112). > > > > This affects both v1 and v2 uAPI. > > > > Fix this by simply checking if the GPIO chip pointer is not NULL. > > > > Fixes: ?? > > And split for v1 and v2 as the Fixes for those will differ? > > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx> > > --- > > drivers/gpio/gpiolib-cdev.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c > > index 0cb6b468f364..d5632742942a 100644 > > --- a/drivers/gpio/gpiolib-cdev.c > > +++ b/drivers/gpio/gpiolib-cdev.c > > @@ -201,6 +201,9 @@ static long linehandle_ioctl(struct file *file, unsigned int cmd, > > unsigned int i; > > int ret; > > > > + if (!lh->gdev->chip) > > + return -ENODEV; > > + > > Is there anything to prevent the chip being removed by another thread > between this check and subsequent usage? > Eh... not really, no. The issue we have here seems to be the same as the one Laurent Pinchart described back in 2015[1] and revisited during his 2022 linux plumbers presentation[2], except he blamed it on devres, whereas I think the problem is much deeper and devres has nothing to do with it. Ideally we'd need a global fortifying of the driver model against hot-unplug related device unbinding. After a quick glance at the relevant code, I think we'd need to add some flag to struct file for the vfs to check on any fs operation and return an error early if user-space tries to operate on an fd associated with a removed device. I'm not sure yet if that's feasible globally or even the right solution at all - just brainstorming here. Then at the subsystem level, the GPIO device struct would need a lock that would be taken by every user-space operation AND the code unregistering the device so that we don't do what you described (i.e. if there's a thread doing a read(), then let's wait until it returns before we drop the device). This wouldn't fix the case in which the same situation happened in a kernel driver but crashing the kernel from within is a much lesser offense than allowing user-space to crash it. So this patch is just papering over for now I suppose. Bart [1] https://lkml.org/lkml/2015/7/14/741 [2] https://www.youtube.com/watch?v=kW8LHWlJPTU