On Fri, Nov 24, 2023 at 05:00:36PM +0100, Bartosz Golaszewski wrote: > Hi! > > I've been scratching my head over it for a couple days and I wanted to > pick your brains a bit. > > The existing locking in GPIOLIB is utterly broken. We have a global > spinlock that "protects" the list of GPIO devices but also the > descriptor objects (and who knows what else). I put "protects" in > quotation marks because the spinlock is released and re-acquired in > several places where the code needs to call functions that can > possibly sleep. I don't have to tell you it makes the spinlock useless > and doesn't protect anything. > > An example of that is gpiod_request_commit() where in the time between > releasing the lock in order to call gc->request() and acquiring it > again, gpiod_free_commit() can be called, thus undoing a part of the > changes we just introduced in the first part of this function. We'd > then return from gc->request() and continue acting like we've just > requested the GPIO leading to undefined behavior. > > There are more instances of this pattern. This seems to be a way to > work around the fact that we have GPIO API functions that can be > called from atomic context (gpiod_set/get_value(), > gpiod_direction_input/output(), etc.) that in their implementation > call driver callbacks that may as well sleep (gc->set(), > gc->direction_output(), etc.). > > Protecting the list of GPIO devices is simple. It should be a mutex as > the list should never be modified from atomic context. This can be > easily factored out right now. Protecting GPIO descriptors is > trickier. If we use a spinlock for that, we'll run into problems with > GPIO drivers that can sleep. If we use a mutex, we'll have a problem > with users calling GPIO functions from atomic context. > > One idea I have is introducing a strict limit on which functions can > be used from atomic context (we don't enforce anything ATM in > functions that don't have the _cansleep suffix in their names) and > check which parts of the descriptor struct they modify. Then protect > these parts with a spinlock in very limited critical sections. Have a > mutex for everything else that can only be accessed from process > context. > > Another one is introducing strict APIs like gpiod_set_value_atomic() > that'll be designed to be called from atomic context exclusively and > be able to handle it. Everything else must only be called from process > context. This of course would be a treewide change as we'd need to > modify all GPIO calls in interrupt handlers. > > I'd like to hear your ideas as this change is vital before we start > protecting gdev->chip with SRCU in all API calls. Brief side note: If we can really fix something (partially) right now, do it, otherwise technical debt kills us. (Most likely I refer to the list of the GPIO devices.) -- With Best Regards, Andy Shevchenko