On 6/21/21 10:29 AM, Steffen Klassert wrote: > On Fri, Jun 18, 2021 at 04:11:01PM +0200, Varad Gautam wrote: >> Commit "xfrm: policy: Read seqcount outside of rcu-read side in >> xfrm_policy_lookup_bytype" [Linked] resolved a locking bug in >> xfrm_policy_lookup_bytype that causes an RCU reader-writer deadlock on >> the mutex wrapped by xfrm_policy_hash_generation on PREEMPT_RT since >> 77cc278f7b20 ("xfrm: policy: Use sequence counters with associated >> lock"). >> >> However, xfrm_sk_policy_lookup can still reach xfrm_policy_lookup_bytype >> while holding rcu_read_lock(), as: >> xfrm_sk_policy_lookup() >> rcu_read_lock() >> security_xfrm_policy_lookup() >> xfrm_policy_lookup() > > Hm, I don't see that call chain. security_xfrm_policy_lookup() calls > a hook with the name xfrm_policy_lookup. The only LSM that has > registered a function to that hook is selinux. It registers > selinux_xfrm_policy_lookup() and I don't see how we can call > xfrm_policy_lookup() from there. > > Did you actually trigger that bug? > Right, I misread the call chain - security_xfrm_policy_lookup does not reach xfrm_policy_lookup, making this patch unnecessary. The bug I have is: T1, holding hash_resize_mutex and sleeping inside synchronize_rcu: __schedule schedule schedule_timeout wait_for_completion __wait_rcu_gp synchronize_rcu xfrm_hash_resize And T2 producing RCU-stalls since it blocked on the mutex: __schedule schedule __rt_mutex_slowlock rt_mutex_slowlock_locked rt_mutex_slowlock xfrm_policy_lookup_bytype.constprop.77 __xfrm_policy_check udpv6_queue_rcv_one_skb __udp6_lib_rcv ip6_protocol_deliver_rcu ip6_input_finish ip6_input ip6_mc_input ipv6_rcv __netif_receive_skb_one_core process_backlog net_rx_action __softirqentry_text_start __local_bh_enable_ip ip6_finish_output2 ip6_output ip6_send_skb udp_v6_send_skb udpv6_sendmsg sock_sendmsg ____sys_sendmsg ___sys_sendmsg __sys_sendmsg do_syscall_64 So, despite the patch here [1], there is another way to reach xfrm_policy_lookup_bytype within an RCU-read side - which on PREEMPT_RT will deadlock with xfrm_hash_resize. Does softirq processing on RT happen within rcu_read_lock/unlock - this would explain the stalls. [1] https://lore.kernel.org/r/20210528160407.32127-1-varad.gautam@xxxxxxxx/ Regards, Varad -- SUSE Software Solutions Germany GmbH Maxfeldstr. 5 90409 Nürnberg Germany HRB 36809, AG Nürnberg Geschäftsführer: Felix Imendörffer