Re: GPIOLIB locking is broken and how to fix it

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 24, 2023 at 07:27:54PM +0200, Andy Shevchenko wrote:
> On Fri, Nov 24, 2023 at 05:00:36PM +0100, Bartosz Golaszewski wrote:
> > Hi!
> > 
> > I've been scratching my head over it for a couple days and I wanted to
> > pick your brains a bit.
> > 
> > The existing locking in GPIOLIB is utterly broken. We have a global
> > spinlock that "protects" the list of GPIO devices but also the
> > descriptor objects (and who knows what else). I put "protects" in
> > quotation marks because the spinlock is released and re-acquired in
> > several places where the code needs to call functions that can
> > possibly sleep. I don't have to tell you it makes the spinlock useless
> > and doesn't protect anything.
> > 
> > An example of that is gpiod_request_commit() where in the time between
> > releasing the lock in order to call gc->request() and acquiring it
> > again, gpiod_free_commit() can be called, thus undoing a part of the
> > changes we just introduced in the first part of this function. We'd
> > then return from gc->request() and continue acting like we've just
> > requested the GPIO leading to undefined behavior.
> > 
> > There are more instances of this pattern. This seems to be a way to
> > work around the fact that we have GPIO API functions that can be
> > called from atomic context (gpiod_set/get_value(),
> > gpiod_direction_input/output(), etc.) that in their implementation
> > call driver callbacks that may as well sleep (gc->set(),
> > gc->direction_output(), etc.).
> > 
> > Protecting the list of GPIO devices is simple. It should be a mutex as
> > the list should never be modified from atomic context. This can be
> > easily factored out right now. Protecting GPIO descriptors is
> > trickier. If we use a spinlock for that, we'll run into problems with
> > GPIO drivers that can sleep. If we use a mutex, we'll have a problem
> > with users calling GPIO functions from atomic context.
> > 
> > One idea I have is introducing a strict limit on which functions can
> > be used from atomic context (we don't enforce anything ATM in
> > functions that don't have the _cansleep suffix in their names) and
> > check which parts of the descriptor struct they modify. Then protect
> > these parts with a spinlock in very limited critical sections. Have a
> > mutex for everything else that can only be accessed from process
> > context.
> > 
> > Another one is introducing strict APIs like gpiod_set_value_atomic()
> > that'll be designed to be called from atomic context exclusively and
> > be able to handle it. Everything else must only be called from process
> > context. This of course would be a treewide change as we'd need to
> > modify all GPIO calls in interrupt handlers.
> > 
> > I'd like to hear your ideas as this change is vital before we start
> > protecting gdev->chip with SRCU in all API calls.
> 
> Brief side note: If we can really fix something (partially) right now,
> do it, otherwise technical debt kills us.
> 
> (Most likely I refer to the list of the GPIO devices.)

Another brief note is to look at irq_get_desc_lock() / irq_put_desc_unlock()
and related APIs. Yet you might probably not need all the complications and
esp. raw spinlock, but perhaps you can get the idea.

-- 
With Best Regards,
Andy Shevchenko






[Index of Archives]     [Linux SPI]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux