Improving the hot-unplug event handling in gpiolib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Kent, Andy, Linus et al!

I've been spending time looking at the gpiolib code and trying to
figure out all that's wrong with hot-unplugging code in order to
finally fix the issues that lead to various memory corruptions and
crashes (or at least paper over the issues enough to make it
semi-reliable at hot-unplug events). I did some experiments with
promising results but before I start coding for real, I wanted to run
this by you.

The easiest way to trigger a crash is instantiating a gpio-sim chip,
running gpiomon or gpionotify on one of its lines, removing the chip
from configfs (simulating an unplug event) and then killing the
program. This will consistently show up in KASAN output as a
use-after-free bug. The same can be achieved for a real-life example
with the cp2112 driver which exposes a GPIO chip over USB.

There are several issues here, and I also list some possible solutions.

1. Not waking up the poll when the chips disappears.

We should call wake_up_poll() on the relevant wait queue with
(EPOLLERR | EPOLLHUP) when the chip goes down to force a return to
user-space from poll(). This way, the current syscall will not be
stuck with a chip gone from under it and any sybsequent
read()/write()/poll() will fail the gdev->chip check.

2. Freeing descriptors and irqs after the GPIO chip is gone.

The GPIO device is gone after the call to gpiochip_remove(). Meanwhile
linereq, lineevent etc. are reference counted and are only released
after the last reference to their owning file descriptor is gone. This
means that userspace process needs to call close() on the relevant fd
or exit before we start releasing the resources associated with these
structures. This is fine for fields only used by the code in
gpiolib-cdev.c but for system-wide resources, we need to free them
right when gpiochip_remove() is called and "numb down" the cdev
structs like we do with the GPIO chip.

To that end I propose to reuse the blocking notifier in struct
gpio_device: let's add a notifer block to struct linereq (I'll focus
on this one here) and get notified when gpiochip_remove() is called.
We can now wake up the poll and free all resources that must not be
used anymore.

3. No locking for struct linereq, lineevent etc.

File operations only take the inode lock and don't protect any data
touched by the callbacks. I think this can lead to some still hidden
issues and especially when we start freeing resources when notified by
core gpiolib. There's config_mutex in struct linereq but it seems to
me that it doesn't protect all data everywhere and linereq is also
accessed from interrupt context, for instance in edge_irq_handler().

We should add spinlocks to every cdev context structure and take care
to protect all accesses correctly.

I'm sure the above list is not complete and other issues will be
uncovered as we move but it's what I figured out so far.

Let me know what you think and if any of this makes sense.

Bartosz



[Index of Archives]     [Linux SPI]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux