On Fri, Mar 15, 2019 at 04:44:52PM +0900, Byungchul Park wrote: > On 03/15/2019 04:31 PM, Byungchul Park wrote: > > On Mon, Mar 11, 2019 at 09:39:39AM -0400, Joel Fernandes wrote: > > > On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote: > > > > RCU's dyntick-idle code is written to tolerate half-interrupts, that it, > > > > either an interrupt that invokes rcu_irq_enter() but never invokes the > > > > corresponding rcu_irq_exit() on the one hand, or an interrupt that never > > > > invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit() > > > > on the other. These things really did happen at one time, as evidenced > > > > by this ca-2011 LKML post: > > > > > > > > http://lkml.kernel.org/r/20111014170019.GE2428@xxxxxxxxxxxxxxxxxx > > > > > > > > The reason why RCU tolerates half-interrupts is that usermode helpers > > > > used exceptions to invoke a system call from within the kernel such that > > > > the system call did a normal return (not a return from exception) to > > > > the calling context. This caused rcu_irq_enter() to be invoked without > > > > a matching rcu_irq_exit(). However, usermode helpers have since been > > > > rewritten to make much more housebroken use of workqueues, kernel threads, > > > > and do_execve(), and therefore should no longer produce half-interrupts. > > > > No one knows of any other source of half-interrupts, but then again, > > > > no one seems insane enough to go audit the entire kernel to verify that > > > > half-interrupts really are a relic of the past. > > > > > > > > This commit therefore adds a pair of WARN_ON_ONCE() calls that will > > > > trigger in the presence of half interrupts, which the code will continue > > > > to handle correctly. If neither of these WARN_ON_ONCE() trigger by > > > > mid-2021, then perhaps RCU can stop handling half-interrupts, which > > > > would be a considerable simplification. > > > Hi Paul and everyone, > > > I was thinking some more about this patch and whether we can simplify this code > > > much in 2021. Since 2021 is a bit far away, I thought working on it in again to > > > keep it fresh in memory is a good idea ;-) > > > > > > To me it seems we cannot easily combine the counters (dynticks_nesting and > > > dynticks_nmi_nesting) even if we confirmed that there is no possibility of a > > > half-interrupt scenario (assuming simplication means counter combining like > > > Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these > > > 2 counters need to be tracked separately as they are used differently in the > > > following function: > > Hi Joel and Paul, > > > > I always love the way to logically approach problems so I'm a fan of > > all your works :) But I'm JUST curious about something here. Why can't > > we combine them the way I tried even if we confirm no possibility of > > half-interrupt? IMHO, the only thing we want to know through calling > > rcu_is_cpu_rrupt_from_idle() is whether the interrupt comes from > > RCU-idle or not - of course assuming the caller context always be an > > well-defined interrupt context like e.g. the tick handler. > > > > So the function can return true if the caller is within a RCU-idle > > region except a well-known single interrupt nested. > > > > Of course, now that we cannot confirm it yet, the crowbar is necessary. > > But does it still have a problem even after confirming it? Why? What am > > I missing? Could you explain why for me? :( > > > Did you also want to consider the case the function is called from others > than > well-known interrupt contexts? If yes, then I agree with you, there doesn't > seem to be the kind of code and it's not a good idea to let the function be > called generally though. We were discussing exactly this on the thread. I am going to be adding a lockdep check to make sure the function isn't called from anywhere but an interrupt context. Then once we can confirm that there are no more half-interrupts in the future, we can apply your counter combining approach. Based on the comments on the rrupt_from_idle() function, it wasn't clear to me if the function was intended to be called from any context. That's why I thought the counter combing approach would not work, but you are right - it should work. I will CC you on the lockdep check patch. Also hope you are subscribed to the new shiny rcu@xxxxxxxxxxxxxxx list as well :-) thanks, - Joel