On Thu, Feb 23, 2023 at 10:22 PM Zhang, Qiang1 <qiang1.zhang@xxxxxxxxx> wrote: > > On Thu, Feb 23, 2023 at 10:05 PM Zhang, Qiang1 <qiang1.zhang@xxxxxxxxx> wrote: > > > > On Thu, Feb 23, 2023 at 9:35 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > On Thu, Feb 23, 2023 at 9:25 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > > On Fri, Feb 24, 2023 at 12:36:05AM +0000, Zhang, Qiang1 wrote: > > > > > On Thu, Feb 23, 2023 at 08:43:05AM +0000, Zhang, Qiang1 wrote: > > > > > > > From: Zqiang <qiang1.zhang@xxxxxxxxx> > > > > > > > Sent: Thursday, February 23, 2023 2:30 PM > > > > > > > To: paulmck@xxxxxxxxxx; frederic@xxxxxxxxxx; quic_neeraju@xxxxxxxxxxx; > > > > > > > joel@xxxxxxxxxxxxxxxxx > > > > > > > Cc: rcu@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > > > > > > > Subject: [PATCH] rcu-tasks: Directly invoke rcuwait_wake_up() in > > > > > > > call_rcu_tasks_generic() > > > > > > > > > > > > > > According to commit '3063b33a347c ("Avoid raw-spinlocked wakeups from > > > > > > > call_rcu_tasks_generic()")', the grace-period kthread is delayed to wakeup > > > > > > > using irq_work_queue() is because if the caller of > > > > > > > call_rcu_tasks_generic() holds a raw spinlock, when the kernel is built with > > > > > > > CONFIG_PROVE_RAW_LOCK_NESTING=y, due to a spinlock will be hold in > > > > > > > wake_up(), so the lockdep splats will happen. but now using > > > > > > > rcuwait_wake_up() to wakeup grace-period kthread instead of wake_up(), in > > > > > > > rcuwait_wake_up() no spinlock will be acquired, so this commit remove using > > > > > > > > > > > > > >There are still spinlock-acquisition and spinlock-release invocations within the call path from rcuwait_wake_up(). > > > > > > > > > > > > > >rcuwait_wake_up() -> wake_up_process() -> try_to_wake_up(), then: > > > > > > > > > > > > > > raw_spin_lock_irqsave() > > > > > > > ... > > > > > > > raw_spin_unlock_irqrestore > > > > > > > > > > > > Yes, but this is raw_spinlock acquisition and release(note: spinlock will convert to > > > > > > sleepable lock in Preempt-RT kernel, but raw spinlock is not change). > > > > > > > > > > > > acquire raw_spinlock -> acquire spinlock will trigger lockdep warning. > > > > > > > > > > > >Is this really safe in the long run though? I seem to remember there are > > > > > >weird locking dependencies if RCU is used from within the scheduler [1]. > > > > > > > > > > > > > > > > > > > > > I have been running rcutorture with rcutorture.type = tasks-tracing, > > > > > so far no problems have been found. > > > > > > > > > > > > > > > >I prefer to keep it as irq_work_queue() unless you are seeing some benefit. > > > > > >Generally, there has to be a 'win' or other justification for adding more > > > > > >risk. > > > > > > > > > > > >thanks, > > > > > > > > > > > >- Joel > > > > > >[1] http://www.joelfernandes.org/rcu/scheduler/locking/2019/09/02/rcu-schedlocks.html > > > > > > > > > > > > > > > The problem in this link, in an earlier RCU version, rcu_read_unlock_special() > > > > > Invoke wakeup and enter scheduler can lead to deadlock, but my modification is for > > > > > call_rcu_tasks_generic(), even if there is a lock dependency problem, we should pay > > > > > more attention to rcu_read_unlock_trace_special() > > > > > > > > Consider ABBA deadlocks as well, not just self-deadlocks (which IIRC is what > > > > the straight-RCU rcu_read_unlock() issues were about). > > > > > > > > What prevents the following scenario? > > > > > > > > In the scheduler you have code like this: > > > > rq = task_rq_lock(p, &rf); > > > > trace_sched_wait_task(p); > > > > > > > > Someone can hook up a BPF program to that tracepoint that then calls > > > > rcu_read_unlock_trace() -> rcu_read_unlock_trace_special(). All of > > > > this while holding the rq and pi scheduler locks. > > > > > > > > That's A (rq lock) -> B (rtpcp lock). > > > > In rcu_read_unlock_trace_special(), the premise of acquiring the rtpcp lock is that > > before that, we have task switch in the rcu_read_lock_trace/unlock_trace critical section. > > but after we already hold the rq lock, no task switch is generated in the > > rcu_read_lock_trace/unlock_trace critical section. > > > > Please correct me if my understanding is wrong. > > > >Yes, but in the next reply I corrected myself and I am still concerned > >about ABBA. There is obviously *some lock* that is held by the callers > >of call_rcu_tasks*(). So there is a dependency that gets created > >between _that_ lock and the rq lock, if you do a wakeup here. And I > >am not sure whether that lock is also acquired when the BPF program > >runs. If it is, then the BPF programs may hang. It is probably worth > >checking with the BPF guys. > > > >More importantly, do you see a benefit with this change in terms of > >anything more than deleting a few lines of code? Paul typically favors > >robustness and guard rails (as do I), unless there is significant > >benefit in performance, power or both. > > because I found that the purpose of using irq_work_queue() early is to solve the problem of lockep splat, > my modified junior is also to avoid unnecessary IPI. irq_work_queue() which this code uses does a self-IPI, unlike irq_work_queue_on() which AFAIK is significantly cheaper on ARM64 than cross-CPU IPIs. Not sure about x86 though. Another way to avoid IPI sometimes is making the IRQ work lazy. It will then do self-IPI only if the tick is turned off. But I'm not sure if that is any better than leaving the code as-is. > but like you said, indeed we are not completely sure > whether there is a potential lock dependency problem, so I agree your opinion. Ok and thanks for digging into it. This was fun! - Joel > > Thanks > Zqiang > > > > >Thanks, > > > > - Joel