On Wed, Oct 04, 2023 at 01:57:04PM +0100, Lee Jones wrote: > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote: > > > On Wed, Oct 04, 2023 at 09:57:20AM +0100, Lee Jones wrote: > > > On Wed, 04 Oct 2023, Greg Kroah-Hartman wrote: > > > > > > > On Wed, Oct 04, 2023 at 05:59:09AM +0000, Starke, Daniel wrote: > > > > > > Daniel, any thoughts? > > > > > > > > > > Our application of this protocol is only with specific modems to enable > > > > > circuit switched operation (handling calls, selecting/querying networks, > > > > > etc.) while doing packet switched communication (i.e. IP traffic over PPP). > > > > > The protocol was developed for such use cases. > > > > > > > > > > Regarding the issue itself: > > > > > There was already an attempt to fix all this by switching from spinlocks to > > > > > mutexes resulting in ~20% performance loss. However, the patch was reverted > > > > > as it did not handle the T1 timer leading into sleep during atomic within > > > > > gsm_dlci_t1() on every mutex lock there. > > > > > > That's correct. When I initially saw this report, my initial thought > > > was to replace the spinlocks with mutexts, but having read the previous > > > accepted attempt and it's subsequent reversion I started to think of > > > other ways to solve this issue. This solution, unlike the last, does > > > not involve adding sleep inducing locks into atomic contexts, nor > > > should it negatively affect performance. > > > > > > > > There was also a suggestion to fix this in do_con_write() as > > > > > tty_operations::write() appears to be documented as "not allowed to sleep". > > > > > The patch for this was rejected. It did not fix the issue within n_gsm. > > > > > > > > > > Link: https://lore.kernel.org/all/20221203215518.8150-1-pchelkin@xxxxxxxxx/ > > > > > Link: https://lore.kernel.org/all/20221212023530.2498025-1-zengheng4@xxxxxxxxxx/ > > > > > Link: https://lore.kernel.org/all/5a994a13-d1f2-87a8-09e4-a877e65ed166@xxxxxxxxxx/ > > > > > > > > Ok, I thought I remembered this, I'll just drop this patch from my > > > > review queue and wait for a better solution if it ever comes up as this > > > > isn't a real issue that people are seeing on actual systems, but just a > > > > syzbot report. > > > > > > What does the "better solution" look like? > > > > One that actually fixes the root problem here (i.e. does not break the > > recursion loop, or cause a performance decrease for normal users, or > > prevent this from being bound to the console). > > Does this solution break the recursion loop or affect performance? This solution broke the recursion by returning an error, right? The performance one was by using mutexes as in previous attempts. thanks, greg k-h