Dear Bartosz, On Tue, 11 Mar 2025 11:21:10 +0100 Bartosz Golaszewski <brgl@xxxxxxxx> wrote: > On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@xxxxxxxxxxx> wrote: > > > > > > Dear Bartosz, > > > > I noticed this because after updating the kernel from 6.11 to 6.14 a > > user-space application that uses GPIOs heavily started getting extremely slow, > > to the point that I will need to heavily modify this application in order to > > be usable again. > > I traced the problem down to the following patch that went into 6.13: > > > > fcc8b637c542 gpiolib: switch the line state notifier to atomic > > > > What happens here, is that gpio_chrdev_release() now calls > > atomic_notifier_chain_unregister(), which uses RCU, and as such must call > > synchronize_rcu(). synchronize_rcu() waits for the RCU grace time to expire > > before returning and according to the documentation can cause a delay of up to > > several milliseconds. In fact it seems to take between 8-10ms on my system (an > > STM32MP153C single-core Cortex-A7). > > > > This has the effect that the time it takes to call close() on a /dev/gpiochipX > > takes now ~10ms each time. If I git-revert this commit, close() will take less > > than 1ms. > > > > Thanks for the detailed report! Thanks to you for making this patch in such a way that it is easy to revert without breaking stuff! That was a read time-saver while diagnosing. > > 10ms doesn't sound like much, but it is more ~10x the time it tool before, > > and unfortunately libgpiod code calls this function very often in some places, > > especially in find_line() if your board has many gpiochips (mine has 16 > > chardevs). > > Yeah, I imagine it can affect the speed of execution of gpiofind, > gpiodetect and any other program that iterates over all character > devices. Indeed, it does. My application is written in python and uses the python gpiod module. Even in such an environment the impact is killing. > > The effect can easily be reproduced with the gpiofind tool: > > > > Running on kernel 6.12: > > > > $ time gpiofind LPOUT0 > > gpiochip7 9 > > real 0m 0.02s > > user 0m 0.00s > > sys 0m 0.01s > > > > Running on kernel 6.13: > > > > $ time gpiofind LPOUT0 > > gpiochip7 9 > > real 0m 0.19s > > user 0m 0.00s > > sys 0m 0.01s > > > > That is almost a 10x increase in execution time of the whole program!! > > > > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was > > on 6.12. > > > > Unfortunately I can't come up with an easy solution to this problem, that's > > why I don't have a patch to propose. Sorry for that. > > > > I still think it is a bit alarming this change has such a huge impact. IMHO it > > really shouldn't. What can be done about this? Is it maybe possible to defer > > unregistering and freeing to a kthread and return from the release function > > earlier? > > > > This was my first idea too. Alternatively we can switch to using a raw > notifier and provide a spinlock ourselves. That would probably be a good alternative, although gpiod_line_state_notify() wouldn't benefit from the zero-lock RCU implementation and incur a spin_lock penalty. Arguably, this is probably a lot more performance-critical than closing the chardev, so maybe the atomic notifier isn't a bad idea... we just need to deal with the writing side so that user-space doesn't have to wait for the RCU grace period? Certainly, I suppose switching to the raw notifier is the easier fix. OTOH, I know from my own experience that the penalty of a spin-lock does matter sometimes and not having it in the performance critical path is probably nice to have. Best regards, -- David Jander