On Tue, Jun 22, 2021 at 01:51:24PM +0200, Frederic Weisbecker wrote: > On Tue, Jun 22, 2021 at 01:21:59PM +0200, Steffen Klassert wrote: > > On Mon, Jun 21, 2021 at 01:05:28PM +0200, Steffen Klassert wrote: > > > On Mon, Jun 21, 2021 at 11:11:18AM +0200, Varad Gautam wrote: > > > > > > > > Right, I misread the call chain - security_xfrm_policy_lookup does not reach > > > > xfrm_policy_lookup, making this patch unnecessary. The bug I have is: > > > > > > > > T1, holding hash_resize_mutex and sleeping inside synchronize_rcu: > > > > > > > > __schedule > > > > schedule > > > > schedule_timeout > > > > wait_for_completion > > > > __wait_rcu_gp > > > > synchronize_rcu > > > > xfrm_hash_resize > > > > > > > > And T2 producing RCU-stalls since it blocked on the mutex: > > > > > > > > __schedule > > > > schedule > > > > __rt_mutex_slowlock > > > > rt_mutex_slowlock_locked > > > > rt_mutex_slowlock > > > > xfrm_policy_lookup_bytype.constprop.77 > > > > > > Ugh, why does xfrm_policy_lookup_bytype use a mutex? This is called > > > in the receive path inside a sofirq. > > > > > > The bug was introduced by: > > > > > > commit 77cc278f7b202e4f16f8596837219d02cb090b96 > > > Author: Ahmed S. Darwish <a.darwish@xxxxxxxxxxxxx> > > > Date: Mon Jul 20 17:55:22 2020 +0200 > > > > > > xfrm: policy: Use sequence counters with associated lock > > > > > > A sequence counter write side critical section must be protected by some > > > form of locking to serialize writers. If the serialization primitive is > > > not disabling preemption implicitly, preemption has to be explicitly > > > disabled before entering the sequence counter write side critical > > > section. > > > > > > A plain seqcount_t does not contain the information of which lock must > > > be held when entering a write side critical section. > > > > > > Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead, > > > which allow to associate a lock with the sequence counter. This enables > > > lockdep to verify that the lock used for writer serialization is held > > > when the write side critical section is entered. > > > > > > If lockdep is disabled this lock association is compiled out and has > > > neither storage size nor runtime overhead. > > > > > > Signed-off-by: Ahmed S. Darwish <a.darwish@xxxxxxxxxxxxx> > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> > > > Link: https://lkml.kernel.org/r/20200720155530.1173732-17-a.darwish@xxxxxxxxxxxxx > > > > > > This uses a seqcount_mutex_t for xfrm_policy_hash_generation, that's > > > wrong. > > > > Varad, can you try to replace the seqcount_mutex_t for xfrm_policy_hash_generation > > by a seqcount_spinlock_t? I'm not familiar with that seqcount changes, > > but we should not end up with using a mutex in this codepath. > > Something like this? (beware, untested, also I don't know if the read side > should then disable bh, doesn't look necessary for PREEMPT_RT, but I may be > missing something...) Looking a bit deeper into this it seems that the problem is that xfrm_policy_hash_generation and hash_resize_mutex do not protect the same thing. hash_resize_mutex protects user configuration against a worker thread that rebalances the hash buckets. xfrm_policy_hash_generation protects user configuration against the data path that runs in softirq. Finally the following line from xfrm_init() relates these two: seqcount_mutex_init(&xfrm_policy_hash_generation, &hash_resize_mutex); That looks a bit odd. This line was also introduced with the above mentioned patch.