Ben Greear wrote: > We started testing with two AR9380 NICs today (one AP, the other STA). > I applied Felix's skb optimization patch, and the ath9k memleak fix patch > on top of 3.3.7+. > > When we generate traffic using a modified version of pktgen, > the STA interface transmits at around 310Mbps for a minute or > two, but then the system dies of OOM (and maybe worse..having > trouble getting useful serial console log). It died much faster > before Felix's two patches were applied. > > I disabled all of our network related buffer adjustments > (ie, no longer increasing tcp_wmem, etc), and it still > crashes. > > The system has 2GB RAM, but it is 32-bit kernel, so not all > is available to the networking code... That said, the OOM > killer kills VNC and such. > > Anyway, I'll try some memleak debugging to see if > I can find any leaks. It seems to me that we should > not actually OOM just by trying to transmit too fast > on a station interface :P In my setup, UDP RX (AP->STA) locks up the station hard and eventually I get this: [ 292.093359] BUG: soft lockup - CPU#0 stuck for 23s! [syslog-ng:403] [ 292.093555] irq event stamp: 12065793 [ 292.093560] hardirqs last enabled at (12065792): [<ffffffff81490855>] _raw_spin_unlock_irqrestore+0x65/0x80 [ 292.093579] hardirqs last disabled at (12065793): [<ffffffff814922ea>] apic_timer_interrupt+0x6a/0x80 [ 292.093591] softirqs last enabled at (4271834): [<ffffffff81059507>] __do_softirq+0x137/0x240 [ 292.093605] softirqs last disabled at (4273355): [<ffffffff81492cec>] call_softirq+0x1c/0x30 [ 292.093617] CPU 0 [ 292.093826] Pid: 403, comm: syslog-ng Tainted: G O 3.4.0-wl #21 LENOVO 7661GN4/7661GN4 [ 292.093841] RIP: 0010:[<ffffffff8149085a>] [<ffffffff8149085a>] _raw_spin_unlock_irqrestore+0x6a/0x80 [ 292.093855] RSP: 0018:ffff88007d403c90 EFLAGS: 00000282 [ 292.093862] RAX: ffff880037526dd0 RBX: 0000000000000000 RCX: 0000000000000002 [ 292.093869] RDX: 0000000000000002 RSI: ffff880037527428 RDI: 0000000000000282 [ 292.093876] RBP: ffff88007d403ca0 R08: ffff8800755f5048 R09: 0000000000000005 [ 292.093883] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007d403c08 [ 292.093889] R13: ffffffff814922ef R14: ffff88007d403ca0 R15: ffffffff81a65c80 [ 292.093897] FS: 00007f6c0dc5db00(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000 [ 292.093905] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 292.093911] CR2: 00007f35a1fbe1c3 CR3: 0000000075490000 CR4: 00000000000007f0 [ 292.093918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 292.093925] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 292.093933] Process syslog-ng (pid: 403, threadinfo ffff8800789de000, task ffff880037526dd0) [ 292.093939] Stack: [ 292.093943] ffff8800755f5000 0000000000000282 ffff88007d403cc0 ffffffff81270aa2 [ 292.093960] 00000000000007b6 0000000000000040 ffff88007d403d10 ffffffff812720fe [ 292.093975] 0000000000000000 0000000001000000 ffff88007d403d10 ffff880075155700 [ 292.093990] Call Trace: [ 292.093995] <IRQ> [ 292.094006] [<ffffffff81270aa2>] dma_entry_alloc+0x62/0x90 [ 292.094017] [<ffffffff812720fe>] debug_dma_map_page+0x7e/0x140 [ 292.094037] [<ffffffffa06b3669>] ath_rx_tasklet+0x969/0x1f10 [ath9k] [ 292.094056] [<ffffffffa06b0054>] ath9k_tasklet+0x104/0x180 [ath9k] [ 292.094068] [<ffffffff8105a073>] tasklet_action+0x83/0x110 [ 292.094079] [<ffffffff81059498>] __do_softirq+0xc8/0x240 [ 292.094091] [<ffffffff81492cec>] call_softirq+0x1c/0x30 [ 292.094102] [<ffffffff81016855>] do_softirq+0xa5/0xe0 [ 292.094113] [<ffffffff81059986>] irq_exit+0xa6/0xe0 [ 292.094124] [<ffffffff81493583>] do_IRQ+0x63/0xe0 [ 292.094134] [<ffffffff81490cef>] common_interrupt+0x6f/0x6f [ 292.094140] <EOI> [ 292.094150] [<ffffffff810aeafc>] ? mark_held_locks+0x8c/0x110 [ 292.094162] [<ffffffff8149085a>] ? _raw_spin_unlock_irqrestore+0x6a/0x80 [ 292.094173] [<ffffffff8107fd23>] __wake_up+0x53/0x70 [ 292.094196] [<ffffffffa022af3c>] jbd2_journal_stop+0x24c/0x2c0 [jbd2] [ 292.094246] [<ffffffffa0275788>] __ext4_journal_stop+0x78/0xa0 [ext4] [ 292.094278] [<ffffffffa025a5ad>] ext4_da_write_end+0xad/0x390 [ext4] [ 292.094293] [<ffffffff81116d03>] generic_file_buffered_write+0x1a3/0x2c0 [ 292.094309] [<ffffffff8148dba7>] ? mutex_lock_nested+0x2f7/0x3d0 [ 292.094321] [<ffffffff81118c4a>] __generic_file_aio_write+0x22a/0x440 [ 292.094334] [<ffffffff81118ed3>] generic_file_aio_write+0x73/0xe0 [ 292.094363] [<ffffffffa025136f>] ext4_file_write+0xaf/0x270 [ext4] [ 292.094374] [<ffffffff810adcf0>] ? lock_release_non_nested+0xa0/0x2e0 [ 292.094403] [<ffffffffa02512c0>] ? ext4_file_mmap+0x60/0x60 [ext4] [ 292.094415] [<ffffffff8117c472>] do_sync_readv_writev+0xd2/0x110 [ 292.094427] [<ffffffff8113bd23>] ? might_fault+0x53/0xb0 [ 292.094438] [<ffffffff8120aa0c>] ? security_file_permission+0x2c/0xb0 [ 292.094449] [<ffffffff8117bb91>] ? rw_verify_area+0x61/0xf0 [ 292.094460] [<ffffffff8117c75b>] do_readv_writev+0xdb/0x1f0 [ 292.094470] [<ffffffff810adcf0>] ? lock_release_non_nested+0xa0/0x2e0 [ 292.094481] [<ffffffff8113bd23>] ? might_fault+0x53/0xb0 [ 292.094491] [<ffffffff814917d5>] ? sysret_check+0x22/0x5d [ 292.094502] [<ffffffff8117c8a5>] vfs_writev+0x35/0x60 [ 292.094511] [<ffffffff8117ca3a>] sys_writev+0x4a/0xc0 [ 292.094523] [<ffffffff814917a9>] system_call_fastpath+0x16/0x1b [ 292.094530] Code: ff 65 48 8b 04 25 70 c8 00 00 83 a8 44 e0 ff ff 01 48 8b 80 38 e0 ff ff a8 08 75 17 5b 41 5c 5d c3 e8 bb e4 c1 ff 48 89 df 57 9d <66> 66 90 66 90 90 eb ce e8 79 e9 ff ff eb e2 0f 1f 80 00 00 00 Or, this is spewed in the log: [ 1934.719823] INFO: rcu_preempt self-detected stall on CPU { 1} (t=18000 jiffies) [ 1934.719830] sending NMI to all CPUs: [ 1934.719830] NMI backtrace for cpu 1 [ 1934.719830] CPU 1 [ 1934.719830] Pid: 0, comm: swapper/1 Tainted: G O 3.4.0-wl #21 LENOVO 7661GN4/7661GN4 [ 1934.719830] RIP: 0010:[<ffffffff8125e302>] [<ffffffff8125e302>] __const_udelay+0x12/0x40 [ 1934.719830] RSP: 0018:ffff88007d503600 EFLAGS: 00000006 [ 1934.719830] RAX: 0000000000000046 RBX: 0000000000002710 RCX: 000000000000f7f6 [ 1934.719830] RDX: 00000000005b54ce RSI: 0000000000000002 RDI: 0000000000418958 [ 1934.719830] RBP: ffff88007d503600 R08: 0000000000000002 R09: 0000000000000000 [ 1934.719830] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81a397c0 [ 1934.719830] R13: ffffffff81a39880 R14: ffff88007d50e7c0 R15: 000001c27649776b [ 1934.719830] FS: 0000000000000000(0000) GS:ffff88007d500000(0000) knlGS:0000000000000000 [ 1934.719830] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1934.719830] CR2: 00007ffbf709408a CR3: 0000000001a0b000 CR4: 00000000000007e0 [ 1934.719830] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1934.719830] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1934.719830] Process swapper/1 (pid: 0, threadinfo ffff880079840000, task ffff880079832f10) [ 1934.719830] Stack: [ 1934.719830] ffff88007d503620 ffffffff8103698a 0000000000000002 ffff88007d50ec80 [ 1934.719830] ffff88007d503670 ffffffff810e0bf3 0000000000000001 ffff880079832f10 [ 1934.719830] ffff88007d503690 0000000000000001 0000000000000001 0000000000000000 [ 1934.719830] Call Trace: [ 1934.719830] <IRQ> [ 1934.719830] [<ffffffff8103698a>] arch_trigger_all_cpu_backtrace+0x6a/0x90 [ 1934.719830] [<ffffffff810e0bf3>] __rcu_pending+0x193/0x4e0 [ 1934.719830] [<ffffffff810e0fba>] rcu_pending+0x7a/0x90 [ 1934.719830] [<ffffffff810e20c9>] rcu_check_callbacks+0xd9/0x220 [ 1934.719830] [<ffffffff81062ad8>] update_process_times+0x48/0x90 [ 1934.719830] [<ffffffff810a79e4>] tick_sched_timer+0x64/0xc0 [ 1934.719830] [<ffffffff810799d7>] __run_hrtimer+0x77/0x270 [ 1934.719830] [<ffffffff810a7980>] ? tick_nohz_handler+0x110/0x110 [ 1934.719830] [<ffffffff8107a6f7>] hrtimer_interrupt+0xd7/0x1f0 [ 1934.719830] [<ffffffff81493669>] smp_apic_timer_interrupt+0x69/0x99 [ 1934.719830] [<ffffffff814922ef>] apic_timer_interrupt+0x6f/0x80 [ 1934.719830] [<ffffffff810a9eaa>] ? lock_is_held+0xaa/0xd0 [ 1934.719830] [<ffffffff8138e00c>] skb_dst_set_noref+0x4c/0x80 [ 1934.719830] [<ffffffff813b826e>] ip_route_input_common+0xbce/0xf20 [ 1934.719830] [<ffffffff813b76a0>] ? ip_route_output_flow+0x70/0x70 [ 1934.719830] [<ffffffff813ba922>] ip_rcv_finish+0x3d2/0x570 [ 1934.719830] [<ffffffff813bb254>] ip_rcv+0x214/0x2e0 [ 1934.719830] [<ffffffff81384836>] __netif_receive_skb+0x5d6/0x690 [ 1934.719830] [<ffffffff81384351>] ? __netif_receive_skb+0xf1/0x690 [ 1934.719830] [<ffffffff8135aa07>] ? led_trigger_event+0x27/0x90 [ 1934.719830] [<ffffffff81384f53>] netif_receive_skb+0x33/0x110 [ 1934.719830] [<ffffffffa0405d92>] ? ieee80211_data_to_8023+0x182/0x440 [cfg80211] [ 1934.719830] [<ffffffffa05b5c53>] ieee80211_deliver_skb.isra.23+0xa3/0x200 [mac80211] [ 1934.719830] [<ffffffffa05b6b2b>] ieee80211_rx_handlers+0xd7b/0x2430 [mac80211] [ 1934.719830] [<ffffffff810aec2c>] ? trace_hardirqs_on_caller+0xac/0x190 [ 1934.719830] [<ffffffff810aed1d>] ? trace_hardirqs_on+0xd/0x10 [ 1934.719830] [<ffffffff81490867>] ? _raw_spin_unlock_irqrestore+0x77/0x80 [ 1934.719830] [<ffffffffa05b8331>] ieee80211_prepare_and_rx_handle+0x151/0xa70 [mac80211] [ 1934.719830] [<ffffffffa05b8f8d>] ieee80211_rx+0x33d/0x8d0 [mac80211] [ 1934.719830] [<ffffffffa05b8cf6>] ? ieee80211_rx+0xa6/0x8d0 [mac80211] [ 1934.719830] [<ffffffff810aeafc>] ? mark_held_locks+0x8c/0x110 [ 1934.719830] [<ffffffffa073e994>] ath_rx_tasklet+0xd24/0x1f10 [ath9k] [ 1934.719830] [<ffffffffa073b014>] ath9k_tasklet+0xe4/0x160 [ath9k] [ 1934.719830] [<ffffffff8105a073>] tasklet_action+0x83/0x110 [ 1934.719830] [<ffffffff81059498>] __do_softirq+0xc8/0x240 [ 1934.719830] [<ffffffff81492cec>] call_softirq+0x1c/0x30 [ 1934.719830] [<ffffffff81016855>] do_softirq+0xa5/0xe0 [ 1934.719830] [<ffffffff81059986>] irq_exit+0xa6/0xe0 [ 1934.719830] [<ffffffff81493583>] do_IRQ+0x63/0xe0 [ 1934.719830] [<ffffffff81490cef>] common_interrupt+0x6f/0x6f [ 1934.719830] <EOI> [ 1934.719830] [<ffffffff8101ce63>] ? native_sched_clock+0x13/0x80 [ 1934.719830] [<ffffffffa032d332>] ? acpi_idle_enter_bm+0x26b/0x2b2 [processor] [ 1934.719830] [<ffffffffa032d32e>] ? acpi_idle_enter_bm+0x267/0x2b2 [processor] [ 1934.719830] [<ffffffff8135848e>] cpuidle_enter+0x1e/0x20 [ 1934.719830] [<ffffffff81358ad6>] cpuidle_idle_call+0xa6/0x340 [ 1934.719830] [<ffffffff8101eee5>] cpu_idle+0xc5/0x140 [ 1934.719830] [<ffffffff8147d376>] start_secondary+0x208/0x20f [ 1934.719830] Code: 66 90 ff 15 39 6f 80 00 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 65 48 8b 14 25 98 3c 01 00 <48> 8d 0c 92 48 8d 04 bd 00 00 00 00 48 89 ca 48 c1 e2 04 48 29 I am trying to see which option in the 'kernel hacking' section causes this, because using my distro's stock kernel doesn't have any trouble in transmitting/receiving large amounts of data. (Probably CONFIG_DEBUG_ATOMIC_SLEEP). > PS. If anyone knows how to make minicom ignore page > refreshes so that it doesn't obscure the last bit of > the crash log when BIOS starts up again, please let me know! There's a capture option in minicom. Sujith -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html