Search Linux Wireless

iwlwifi ax210 soft lockup on upstream 6.11.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

We ran some tests on 6.11.0 plus a single patch to fix the VRF deadlock,
and I believe we are hitting a spin-forever type of bug in the iwlwifi
transmit logic.

This bug is reproducible after an hour or two of heavy traffic load
across 12 ax210 radios.  There are lots of mac-vlans on the system
as well, but I don't see indication that is part of the problem.

This same test case, on a more heavily patched kernel, sometimes shows a tcp_ack
crash in what appears to be a corrupted list.  I'm not sure if the
lockup below is directly related to that or not, but possibly
a corrupted list is making the gso logic spin forever?

This kernel has KASAN, lockdep, list debugging, and everything else
I know to turn on to debug this sort of thing.

The full log of the lockup is found here:

https://www.candelatech.com/downloads/6.11.0dbg30sept.txt


watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [btserver:28762]
Modules linked in: tls nf_conntrack_netlink nf_conntrack nfnetlink btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat jfs nls_ucs2_utils xfs nf_defrag_ipv6 nf_defrag_ipv4 vrf 8021q garp mrp stp llc macvlan pktgen rpcrdma rdma_cm iw_cm ib_cm ib_core qrtr iwlmvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt ee1004 snd_hda_scodec_component intel_pmc_bxt intel_rapl_msr iTCO_vendor_support snd_hda_intel snd_intel_dspcfg coretemp intel_rapl_common snd_hda_codec intel_uncore_frequency snd_hda_core intel_uncore_frequency_common mac80211 snd_hwdep snd_seq snd_seq_device snd_pcm iwlwifi intel_tcc_cooling x86_pkg_temp_thermal snd_timer intel_powerclamp intel_wmi_thunderbolt i2c_i801 snd i2c_smbus pcspkr soundcore i2c_mux bfq cfg80211 mei_hdcp mei_pxp intel_pch_thermal intel_pmc_core intel_vsec pmt_telemetry pmt_class acpi_pad sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram raid1 dm_raid raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq i915 cec rc_core drm_buddy intel_gtt drm_display_helper ixgbe drm_kms_helper igb mdio ttm dca i2c_algo_bit agpgart hwmon drm mei_wdt xhci_pci i2c_core xhci_pci_renesas video wmi fuse [last unloaded: nfnetlink]
irq event stamp: 462966149
hardirqs last  enabled at (462966148): [<ffffffff81a5688b>] ___slab_alloc+0x2cb/0x12e0
hardirqs last disabled at (462966149): [<ffffffff83eb133a>] sysvec_apic_timer_interrupt+0xa/0x90
softirqs last  enabled at (461971904): [<ffffffff8129bd51>] __irq_exit_rcu+0x91/0xc0
softirqs last disabled at (461971910): [<ffffffff835b0afa>] __dev_queue_xmit+0x21a/0x36e0
CPU: 1 UID: 0 PID: 28762 Comm: btserver Tainted: G        W    L     6.11.0+ #2
Tainted: [W]=WARN, [L]=SOFTLOCKUP
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
RIP: 0010:unwind_next_frame+0x35b/0x2320
Code: 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e b9 15 00 00 41 c7 45 00 00 00 00 00 31 c0 48 83 c4 38 5b 5d 41 5c 41 5d <41> 5e 41 5f c3 49 81 ff 00 a0 10 86 0f 83 af 01 00 00 4c 89 ff e8
RSP: 0018:ffff88820c31e180 EFLAGS: 00000296
RAX: 0000000000000001 RBX: ffffffff814ed640 RCX: ffffffff85fb5dea
RDX: 1ffff11041863c35 RSI: 0000000000000000 RDI: ffff88820c31e1a8
RBP: ffff88820c31e230 R08: ffffffff85fb5dee R09: ffff88820c31e1d8
R10: ffff88820c31e198 R11: 0000000000030000 R12: ffff88820c31e260
R13: 0000000000000000 R14: ffff88820c31ff48 R15: 0000000000000001
FS:  00007f340183a7c0(0000) GS:ffff88841da80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000149adaa8 CR3: 000000013a2c8005 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 ? watchdog_timer_fn+0x2d9/0x3b0
 ? lockup_detector_update_enable+0x50/0x50
 ? __hrtimer_run_queues+0x495/0x860
 ? trace_hrtimer_cancel+0x130/0x130
 ? ktime_get_update_offsets_now+0x87/0x2c0
 ? hrtimer_interrupt+0x2c4/0x7d0
 ? __sysvec_apic_timer_interrupt+0xbc/0x380
 ? sysvec_apic_timer_interrupt+0x69/0x90
 </IRQ>
 <TASK>
 ? asm_sysvec_apic_timer_interrupt+0x16/0x20
 ? write_profile+0xe0/0xe0
 ? unwind_next_frame+0x35b/0x2320
 arch_stack_walk+0x7d/0xe0
 stack_trace_save+0x8b/0xc0
 ? filter_irq_stacks+0x90/0x90
 ? kasan_save_stack+0x2a/0x40
 ? kasan_save_stack+0x1c/0x40
 ? kasan_save_track+0x10/0x30
 ? __kasan_slab_alloc+0x62/0x70
 ? kmem_cache_alloc_node_noprof+0xf1/0x2e0
 ? kmalloc_reserve+0x21d/0x4d0
 kasan_save_stack+0x1c/0x40
 ? kasan_save_stack+0x1c/0x40
 ? kasan_save_track+0x10/0x30
 ? __kasan_slab_alloc+0x62/0x70
 ? kmem_cache_alloc_node_noprof+0xf1/0x2e0
 ? __alloc_skb+0x267/0x390
 ? skb_segment+0xad2/0x4260
 ? tcp_gso_segment+0x311/0x1ac0
 ? inet_gso_segment+0x465/0x15a0
 ? skb_mac_gso_segment+0x227/0x540
 ? __skb_gso_segment+0x2bf/0x5a0
 ? iwl_mvm_tx_tso_segment+0x7b4/0x1070 [iwlmvm]
 ? iwl_mvm_tx_skb_sta+0x10b1/0x1700 [iwlmvm]
 ? iwl_mvm_tx_skb+0x12/0x60 [iwlmvm]
 ? iwl_mvm_mac_itxq_xmit+0x1d9/0x3e0 [iwlmvm]
 ? ieee80211_queue_skb+0x10c9/0x2010 [mac80211]
 ? __ieee80211_xmit_fast+0x5f9/0x37f0 [mac80211]
 ? __ieee80211_subif_start_xmit+0x6d4/0x1150 [mac80211]
 ? ieee80211_subif_start_xmit+0xf2/0x1630 [mac80211]
 ? dev_hard_start_xmit+0xf0/0x570
 ? __dev_queue_xmit+0x794/0x36e0
 ? ip_finish_output2+0x10cd/0x1ff0
 ? __ip_queue_xmit+0x85f/0x1e20
 ? __tcp_transmit_skb+0x1699/0x3850
 ? tcp_write_xmit+0x14ac/0x8b50
 ? __tcp_push_pending_frames+0x8b/0x310
 ? tcp_sendmsg_locked+0x20de/0x31e0
 ? tcp_sendmsg+0x22/0x40
 ? __sock_sendmsg+0xbc/0x150
 ? __sys_sendto+0x19c/0x250
 ? __x64_sys_sendto+0xd7/0x1b0
 ? do_syscall_64+0x69/0x160
 ? entry_SYSCALL_64_after_hwframe+0x4b/0x53
 ? ___slab_alloc+0x2cb/0x12e0
 ? kasan_save_track+0x10/0x30
 ? __kasan_kmalloc+0x83/0x90
 ? __kmalloc_node_noprof+0x35f/0x420
 ? alloc_slab_obj_exts+0x4c/0xa0
 ? alloc_slab_obj_exts+0x4c/0xa0
 ? kasan_save_track+0x10/0x30
 ? __kasan_slab_alloc+0x62/0x70
 kasan_save_track+0x10/0x30
 __kasan_slab_alloc+0x62/0x70
 kmem_cache_alloc_node_noprof+0xf1/0x2e0
 ? __build_skb_around+0x235/0x3f0
 __alloc_skb+0x267/0x390
 ? __napi_build_skb+0x40/0x40
 ? skb_segment+0xc19/0x4260
 ? kasan_check_range+0x109/0x190
 ? __asan_memcpy+0x38/0x60
 skb_segment+0xad2/0x4260
 ? pskb_extract+0x230/0x230
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 ? print_usage_bug.part.0+0x600/0x600
 tcp_gso_segment+0x311/0x1ac0
 ? skb_still_in_host_queue+0x250/0x250
 ? lock_is_held_type+0xa5/0x110
 inet_gso_segment+0x465/0x15a0
 skb_mac_gso_segment+0x227/0x540
 ? mark_lock+0x101/0x16c0
 ? skb_gso_validate_mac_len+0x270/0x270
 ? print_usage_bug.part.0+0x600/0x600
 __skb_gso_segment+0x2bf/0x5a0
 iwl_mvm_tx_tso_segment+0x7b4/0x1070 [iwlmvm]
 ? invoke_tx_handlers_late+0x15c/0x2950 [mac80211]
 ? lock_is_held_type+0xa5/0x110
 ? ieee80211_xmit_fast_finish+0xa8/0xa80 [mac80211]
 ? iwl_mvm_skb_prepare_status+0x60/0x60 [iwlmvm]
 ? iwl_mvm_get_lmac_id+0x3d/0x80 [iwlmvm]
 iwl_mvm_tx_skb_sta+0x10b1/0x1700 [iwlmvm]
 ? iwl_mvm_max_amsdu_size+0x750/0x750 [iwlmvm]
 ? mark_lock+0x101/0x16c0
 ? lock_release+0x211/0x680
 ? ieee80211_queue_skb+0xe93/0x2010 [mac80211]
 iwl_mvm_tx_skb+0x12/0x60 [iwlmvm]
 iwl_mvm_mac_itxq_xmit+0x1d9/0x3e0 [iwlmvm]
 ? iwl_mvm_mac_wake_tx_queue+0x48/0x2d0 [iwlmvm]
 ? lockdep_hardirqs_on_prepare+0x127/0x3e0
 ieee80211_queue_skb+0x10c9/0x2010 [mac80211]
 ? lockdep_hardirqs_on_prepare+0x327/0x3e0
 __ieee80211_xmit_fast+0x5f9/0x37f0 [mac80211]
 ? netdev_nl_stats_write_tx+0x167/0x760
 ? reacquire_held_locks+0x456/0x4e0
 ? ieee80211_clear_fast_xmit+0xd0/0xd0 [mac80211]
 ? skb_gso_validate_mac_len+0x270/0x270
 ? __skb_gso_segment+0x2bf/0x5a0
 ? lock_is_held_type+0xa5/0x110
 __ieee80211_subif_start_xmit+0x6d4/0x1150 [mac80211]
 ? find_held_lock+0x2d/0x110
 ? __ieee80211_xmit_fast+0x37f0/0x37f0 [mac80211]
 ? reacquire_held_locks+0x4e0/0x4e0
 ieee80211_subif_start_xmit+0xf2/0x1630 [mac80211]
 ? lock_release+0x211/0x680
 ? dev_queue_xmit_nit+0x720/0xa40
 ? reacquire_held_locks+0x4e0/0x4e0
 ? __ieee80211_subif_start_xmit+0x1150/0x1150 [mac80211]
 ? __skb_clone+0x547/0x730
 ? run_filter+0x310/0x310
 ? dev_queue_xmit_nit+0x725/0xa40
 ? dev_hard_start_xmit+0xf0/0x570
 dev_hard_start_xmit+0xf0/0x570
 __dev_queue_xmit+0x794/0x36e0
 ? __lock_acquire+0x1c01/0x6300
 ? netdev_core_pick_tx+0x2e0/0x2e0
 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
 ? print_usage_bug.part.0+0x600/0x600
 ? find_held_lock+0x2d/0x110
 ? lock_acquire+0x19a/0x4f0
 ? find_held_lock+0x2d/0x110
 ? lock_release+0x211/0x680
 ? __ip_queue_xmit+0x85f/0x1e20
 ? reacquire_held_locks+0x4e0/0x4e0
 ? lock_is_held_type+0xa5/0x110
 ? mark_held_locks+0x9e/0xe0
 ? nf_hook+0x2de/0x430
 ip_finish_output2+0x10cd/0x1ff0
 ? lock_is_held_type+0x63/0x110
 ? ip_fragment.constprop.0+0x220/0x220
 ? __ip_finish_output+0x30c/0x520
 __ip_queue_xmit+0x85f/0x1e20
 ? __skb_clone+0x547/0x730
 __tcp_transmit_skb+0x1699/0x3850
 ? __tcp_select_window+0xfb0/0xfb0
 ? __alloc_skb+0x2c6/0x390
 ? __napi_build_skb+0x40/0x40
 tcp_write_xmit+0x14ac/0x8b50
 ? copy_page_to_iter+0x100/0x100
 __tcp_push_pending_frames+0x8b/0x310
 tcp_sendmsg_locked+0x20de/0x31e0
 ? print_usage_bug.part.0+0x5c3/0x600
 ? tcp_sendmsg_fastopen+0x6e0/0x6e0
 ? lock_release+0x211/0x680
 ? tcp_sendmsg+0x14/0x40
 ? inet_send_prepare+0x3d0/0x3d0
 ? inet_send_prepare+0x3d0/0x3d0
 tcp_sendmsg+0x22/0x40
 __sock_sendmsg+0xbc/0x150
 __sys_sendto+0x19c/0x250
 ? __ia32_sys_getpeername+0xa0/0xa0
 ? handle_irq_event+0x107/0x1b0
 ? __fget_light+0x4d/0x1d0
 ? __common_interrupt+0x62/0x110
 ? ksys_read+0x166/0x1c0
 ? __ia32_sys_pwrite64+0x1d0/0x1d0
 __x64_sys_sendto+0xd7/0x1b0
 ? lockdep_hardirqs_on_prepare+0x275/0x3e0
 do_syscall_64+0x69/0x160
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f3401b13680
Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 55 48 83 ec 20 48
RSP: 002b:00007ffdb34580b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000000000000ffdc RCX: 00007f3401b13680
RDX: 0000000000010000 RSI: 000000002bc4f000 RDI: 0000000000000077
RBP: 00007ffdb34581a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000723
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
 </TASK>

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux