On Fri, Nov 24, 2023 at 05:00:36PM +0100, Bartosz Golaszewski wrote: > Hi! > > I've been scratching my head over it for a couple days and I wanted to > pick your brains a bit. > > The existing locking in GPIOLIB is utterly broken. We have a global > spinlock that "protects" the list of GPIO devices but also the > descriptor objects (and who knows what else). I put "protects" in > quotation marks because the spinlock is released and re-acquired in > several places where the code needs to call functions that can > possibly sleep. I don't have to tell you it makes the spinlock useless > and doesn't protect anything. > You have to tell me, cos I don't think it makes it useless, it just constrains the contexts in which it is providing protection. No lock provides protection when it isn't locked - the problem here is the assumption that it does. Depending on the scope of the locking, and what else it may block, releasing and re-aquiring may be the correct strategy - just don't assume state across the unlock. If there is no lock held during driver callbacks, then they, and anything they call, need to be aware that things can change, and GPIOLIB needs to ensure that nothing the driver is accessing is freed until the callbacks return. > An example of that is gpiod_request_commit() where in the time between > releasing the lock in order to call gc->request() and acquiring it > again, gpiod_free_commit() can be called, thus undoing a part of the > changes we just introduced in the first part of this function. We'd > then return from gc->request() and continue acting like we've just > requested the GPIO leading to undefined behavior. > So GPIOLIB needs to check for that after re-acquiring the lock. > There are more instances of this pattern. This seems to be a way to > work around the fact that we have GPIO API functions that can be > called from atomic context (gpiod_set/get_value(), > gpiod_direction_input/output(), etc.) that in their implementation > call driver callbacks that may as well sleep (gc->set(), > gc->direction_output(), etc.). > Which is fine - as long as GPIOLIB is aware that things can change while it doesn't hold the lock. And as long as it doesn't go freeing objects still in use. > Protecting the list of GPIO devices is simple. It should be a mutex as > the list should never be modified from atomic context. This can be > easily factored out right now. > Protecting GPIO descriptors is > trickier. If we use a spinlock for that, we'll run into problems with > GPIO drivers that can sleep. If we use a mutex, we'll have a problem > with users calling GPIO functions from atomic context. > Generally I would advocate for not holding locks over callbacks. If the callback requires locks then it (or the code it calls) should request them itself. GPIOLIB just needs to ensure the desc isn't freed during the callback. The contract for the driver callbacks is not clear to me. e.g. can they be called concurrently (e.g. different cdev requests trying to set different lines and so both calling gc->set()) If so, are the drivers aware of that? If not, a mutex over the callbacks would seem very appropriate to enforce that contract, independent of protecting the desc. (though ideally the drivers are be aware that they need to provide their own locking as appropriate) > One idea I have is introducing a strict limit on which functions can > be used from atomic context (we don't enforce anything ATM in > functions that don't have the _cansleep suffix in their names) and > check which parts of the descriptor struct they modify. Then protect > these parts with a spinlock in very limited critical sections. Have a > mutex for everything else that can only be accessed from process > context. > ... and needs to call cansleep within the critical section itself. > Another one is introducing strict APIs like gpiod_set_value_atomic() > that'll be designed to be called from atomic context exclusively and > be able to handle it. Everything else must only be called from process > context. This of course would be a treewide change as we'd need to > modify all GPIO calls in interrupt handlers. > > I'd like to hear your ideas as this change is vital before we start > protecting gdev->chip with SRCU in all API calls. > As above, I'm not clear on the driver callback contract, so there is a good chance anything more specific I have to add would be incoherent. I do have concerns that adding or changing locking could inadvertantly constrain otherwise valid concurrency, but safe is better than fast. But safe AND fast is ideal. Sorry if that is obvious and not much help. Cheers, Kent.