Re: Improving the hot-unplug event handling in gpiolib

Kent Gibson <warthog618@xxxxxxxxx> · Sun, 2 Jul 2023 21:03:24 +0800

On Thu, Jun 29, 2023 at 11:39:15AM +0200, Bartosz Golaszewski wrote:
> Hey Kent, Andy, Linus et al!
> 
> I've been spending time looking at the gpiolib code and trying to
> figure out all that's wrong with hot-unplugging code in order to
> finally fix the issues that lead to various memory corruptions and
> crashes (or at least paper over the issues enough to make it
> semi-reliable at hot-unplug events). I did some experiments with
> promising results but before I start coding for real, I wanted to run
> this by you.
> 
> The easiest way to trigger a crash is instantiating a gpio-sim chip,
> running gpiomon or gpionotify on one of its lines, removing the chip
> from configfs (simulating an unplug event) and then killing the
> program. This will consistently show up in KASAN output as a
> use-after-free bug. The same can be achieved for a real-life example
> with the cp2112 driver which exposes a GPIO chip over USB.
> 
> There are several issues here, and I also list some possible solutions.
> 
> 1. Not waking up the poll when the chips disappears.
> 
> We should call wake_up_poll() on the relevant wait queue with
> (EPOLLERR | EPOLLHUP) when the chip goes down to force a return to
> user-space from poll(). This way, the current syscall will not be
> stuck with a chip gone from under it and any sybsequent
> read()/write()/poll() will fail the gdev->chip check.
> 
> 2. Freeing descriptors and irqs after the GPIO chip is gone.
> 
> The GPIO device is gone after the call to gpiochip_remove(). Meanwhile
> linereq, lineevent etc. are reference counted and are only released
> after the last reference to their owning file descriptor is gone. This
> means that userspace process needs to call close() on the relevant fd
> or exit before we start releasing the resources associated with these
> structures. This is fine for fields only used by the code in
> gpiolib-cdev.c but for system-wide resources, we need to free them
> right when gpiochip_remove() is called and "numb down" the cdev
> structs like we do with the GPIO chip.
> 
> To that end I propose to reuse the blocking notifier in struct
> gpio_device: let's add a notifer block to struct linereq (I'll focus
> on this one here) and get notified when gpiochip_remove() is called.
> We can now wake up the poll and free all resources that must not be
> used anymore.
> 
> 3. No locking for struct linereq, lineevent etc.
> 
> File operations only take the inode lock and don't protect any data
> touched by the callbacks. I think this can lead to some still hidden
> issues and especially when we start freeing resources when notified by
> core gpiolib. There's config_mutex in struct linereq but it seems to
> me that it doesn't protect all data everywhere and linereq is also
> accessed from interrupt context, for instance in edge_irq_handler().
> 
> We should add spinlocks to every cdev context structure and take care
> to protect all accesses correctly.
> 
> I'm sure the above list is not complete and other issues will be
> uncovered as we move but it's what I figured out so far.
> 
> Let me know what you think and if any of this makes sense.
> 

My biggest concern here is that cdev is being treated as a special case,
when it should just be another gpiolib user.
Potentially the prototypical gpiolib user, but still just a gpiolib user.
More specifically, a gpiodev and gpiod user, NOT a gpiochip user.

IMHO all the gpiochip functions that cdev calls should be delegated to
gpiolib functions that perform the gdev->chip mapping, with appropriate
locking, and then the gpiochip function.
e.g. gpiochip_get_desc(chip, offset) -> gpiodev_get_desc(gdev, offset).
Then cdev never directly accesses the chip - that is gpiolib's problem.

That reduces the problem to notifying the gpiod user when the chip is
removed. Given such a notification, the gpiod user can release any
associated resources, including all the descs and the gdev.
So gpiochip_remove() would trigger the removal notification, after numbing
the device, and wait for all descs using it to be released before putting
the device.  It might be good to incorporate the registration of a
removal handler into the gpio_device_get() itself, to ensure that the user
provides one.

In the case of cdev, the removal notification for a given request would
trigger numbing all the ioctls, releasing the associated descs and irqs
and the gdev itself, and signalling the user via the request fd.

Wrt point 2 - the removal notification, so that would be similar to
the block notifier in struct gpio_chardev_data, which is already hanging off
the gpio_device notifier, and that receives GPIOLINE_CHANGED_* events?
But it would only receive removal events??  Or do you expect it to filter the
GPIOLINE_CHANGED_* events?

No conceptual problem with using a blocking_notifier for this, but maybe a
dedicated one rather then piggybacking on the GPIOLINE_CHANGED_*
notifier?

Be mindful of the race condition where the chip is removed between the user
requesting the desc and registering the notifier callback.

Wrt point 3, not sure how the inode lock is relevant here.
The lock is not held during ioctl calls, right?
If the inode lock IS held, then the ioctls would be inherently serialized,
and the config_mutex would be redundant, as its purpose is to prevent
ioctls concurrently twiddling with the desc flags and debounce_period.
FWIW, I've always assumed the ioctls can be concurrent.

And while the linereq is accessed from interrupt context in
edge_irq_thread(), the contained desc and irq are not.

I don't think there is any need to go adding spin locks everywhere.
Each field has its access and locking strategy documented in its kernel-doc.
If you have concrete cases where that is incorrect or the locking is
deficient then please share.

Anyway, cdev will need to coordinate the notifier callback with its
other operations to ensure the callback doesn't release the descs until
they are numbed - including removing the irqs.
Need to think about the best way to handle that.  It may require a mutex
to serialize request ops as the descs are held during calls like
gpiod_get_value_cansleep(desc), so can't use a spin lock.

Does that make sense?

Cheers,
Kent.