Re: [PATCH net 0/4] selftests/net/tcp_ao: A bunch of fixes for TCP-AO selftests

Jakub Kicinski <kuba@xxxxxxxxxx> · Wed, 17 Apr 2024 13:46:36 -0700

On Wed, 17 Apr 2024 19:47:18 +0100 Dmitry Safonov wrote:
> 1. [ 240.001391][ T833] Possible interrupt unsafe locking scenario:
> [  240.001391][  T833]
> [  240.001635][  T833]        CPU0                    CPU1
> [  240.001797][  T833]        ----                    ----
> [  240.001958][  T833]   lock(&p->alloc_lock);
> [  240.002083][  T833]                                local_irq_disable();
> [  240.002284][  T833]                                lock(&ndev->lock);
> [  240.002490][  T833]                                lock(&p->alloc_lock);
> [  240.002709][  T833]   <Interrupt>
> [  240.002819][  T833]     lock(&ndev->lock);
> [  240.002937][  T833]
> [  240.002937][  T833]  *** DEADLOCK ***
> 
> https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/537021/14-self-connect-ipv6/stderr
> 
> 2. [  251.411647][   T71] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock
> order detected
> [  251.411986][   T71] 6.9.0-rc1-virtme #1 Not tainted
> [  251.412214][   T71] -----------------------------------------------------
> [  251.412533][   T71] kworker/u16:1/71 [HC0[0]:SC0[2]:HE1:SE0] is
> trying to acquire:
> [  251.412837][   T71] ffff888005182c28 (&p->alloc_lock){+.+.}-{2:2},
> at: __get_task_comm+0x27/0x70
> [  251.413214][   T71]
> [  251.413214][   T71] and this task is already holding:
> [  251.413527][   T71] ffff88802f83efd8 (&ul->lock){+.-.}-{2:2}, at:
> rt6_uncached_list_flush_dev+0x138/0x840
> [  251.413887][   T71] which would create a new lock dependency:
> [  251.414153][   T71]  (&ul->lock){+.-.}-{2:2} -> (&p->alloc_lock){+.+.}-{2:2}
> [  251.414464][   T71]
> [  251.414464][   T71] but this new dependency connects a SOFTIRQ-irq-safe lock:
> [  251.414808][   T71]  (&ul->lock){+.-.}-{2:2}
> 
> https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/537201/17-icmps-discard-ipv4/stderr
> 
> 3. [ 264.280734][ C3] Possible unsafe locking scenario:
> [  264.280734][    C3]
> [  264.280968][    C3]        CPU0                    CPU1
> [  264.281117][    C3]        ----                    ----
> [  264.281263][    C3]   lock((&tw->tw_timer));
> [  264.281427][    C3]
> lock(&hashinfo->ehash_locks[i]);
> [  264.281647][    C3]                                lock((&tw->tw_timer));
> [  264.281834][    C3]   lock(&hashinfo->ehash_locks[i]);
> 
> https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/547461/19-self-connect-ipv4/stderr
> 
> I can spend some time on them after I verify that my fix for -stable
> is actually fixing an issue I think it fixes.
> Seems like your automation + my selftests are giving some fruits, hehe.

Oh, very interesting, I don't recall these coming up before.

We try to extract crashes but apparently we're missing lockdep splats.
I'll try to improve the extraction logic...