(added lkml - so please keep the CC!) On Tuesday 13 January 2009 22:39:00 Artur Skawina wrote: > Artur Skawina wrote: > >>> The machine has 512M, ~100M should be (usually is) free, is under constant light > >>> load (typically <2k ints/s, 60% idle) and is running fine for weeks/months between > >>> reboots, but locks up after only a few packets go over the hostap driven > >>> p54usb device. I need the box to be up, that limits the number of tests i can > >>> run, at least as long as the lockups w/o any diagnostics happen... > >> Do keyboard-leds "flash" when it locks up, or does console respond > >> if you press alt-sysrq-m / alt-sysrq-w on the connected keyboard? > > > > most of the times it happened there was no kbd attached. At least once > > when it _was_ connected, sysrq was working, and i saw 0*8KB; that's why > > i initially suspected fragmentation. > > > >> ( If your box has a serial port, you can try to get the logs from there... ) > > after switching from SLUB to SLAB and enabling some debugging i finally caught this: arg, that's not good... I hoped for a obvious BUG in p54, or mac80211. But not in the other part of the kernel. I've no idea what's going on in the timer/mm part (but maybe someone else @ lkml ??!) since "cache_free_debugcheck" has about 3 (well, there are 4, but the first one is unlikely) BUG_ON? This smells like a memory corruption. Have you tried to enable CONFIG_DEBUG_SLAB? Is this related to the "truesize bug", Or how long does the box survive if you don't allow named to bind/listen to wlanX ? > ------------[ cut here ]------------ > Kernel BUG at c016a8a3 [verbose debug info unavailable] > invalid opcode: 0000 [#1] > last sysfs file: /sys/devices/pci0000:00/0000:00:07.2/usb1/1-1/1-1.1/uevent > Modules linked in: netconsole saa7134_empress saa6752hs lnbp21 s5h1420 saa7134 budget videobuf_dma_sg budget_ci budget_core saa7146 ttpci_eeprom videobuf_core tveeprom serio_raw ir_common [last unloaded: netconsole] > > Pid: 1885, comm: named Not tainted (2.6.28-rc8-00519-g90435df #42) > EIP: 0060:[<c016a8a3>] EFLAGS: 00210012 CPU: 0 > EIP is at cache_free_debugcheck+0x203/0x250 > EAX: dfb6c71f EBX: df803d20 ECX: dfb6c03f EDX: 00000002 > ESI: dfb6c720 EDI: 00000370 EBP: c1000000 ESP: c0669f74 > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Process named (pid: 1885, ti=c0669000 task=df8443d0 task.ti=deb85000) > Stack: > 00000000 df809660 d31d4528 00000003 00000000 00000002 c137c440 c060e2dc > c01483e2 dfb6c000 df808d38 df803d20 c069cb40 00200286 c016a911 00000000 > 00000005 c069cb40 00000009 c01483e2 00000020 00000001 00000100 c014850f > Call Trace: > [<c01483e2>] __rcu_process_callbacks+0xd2/0x1f0 > [<c016a911>] kmem_cache_free+0x21/0x60 > [<c01483e2>] __rcu_process_callbacks+0xd2/0x1f0 > [<c014850f>] rcu_process_callbacks+0xf/0x20 > [<c0127a37>] __do_softirq+0x57/0xf0 > [<c01279e0>] __do_softirq+0x0/0xf0 > <IRQ> <0> [<c01277e5>] irq_exit+0x45/0x70 > [<c0112590>] smp_apic_timer_interrupt+0x40/0x70 > [<c0103d9c>] apic_timer_interrupt+0x28/0x30 > Code: 8b 44 24 24 b9 fe ff ff ff 89 4c 90 1c f6 43 19 08 74 0e b9 6b 00 00 00 89 f2 89 d8 e8 e7 fa ff ff 83 c4 28 89 f0 5b 5e 5f 5d c3 <0f> 0b eb fe 0f 0b eb fe 8b 43 10 8d 44 06 f8 8d b6 00 00 00 00 > EIP: [<c016a8a3>] cache_free_debugcheck+0x203/0x250 SS:ESP 0068:c0669f74 > Kernel panic - not syncing: Fatal exception in interrupt > > followed after some time by lots of page alloc failures [1]. > > artur > > [1] > [...] > __ratelimit: 1551 callbacks suppressed > named: page allocation failure. order:0, mode:0x20 > Pid: 1885, comm: named Tainted: G D 2.6.28-rc8-00519-g90435df #42 > Call Trace: > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > named: page allocation failure. order:0, mode:0x20 > Pid: 1885, comm: named Tainted: G D 2.6.28-rc8-00519-g90435df #42 > Call Trace: > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c016b573>] cache_alloc_refill+0x363/0x710 > [<c03a52c4>] __alloc_skb+0x34/0x120 > [<c016bcc1>] kmem_cache_alloc+0xe1/0xf0 > [<c03a52c4>] __alloc_skb+0x34/0x120 > [<c03b8205>] find_skb+0x35/0x90 > [<c03b840e>] netpoll_send_udp+0x2e/0x200 > [<e33661ad>] write_msg+0x9d/0xe0 [netconsole] > [<e3366110>] write_msg+0x0/0xe0 [netconsole] > [<c0123443>] __call_console_drivers+0x43/0x50 > [<c01238bb>] release_console_sem+0x13b/0x1c0 > [<c0123dd7>] vprintk+0x227/0x2d0 > [<c0123443>] __call_console_drivers+0x43/0x50 > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c04c30c0>] printk+0x17/0x1f > [<c0105909>] print_trace_address+0x49/0x60 > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c01059a4>] dump_trace+0x84/0x100 > [<c0105fde>] show_trace+0x4e/0x60 > [<c04c2fc1>] dump_stack+0x6e/0x73 > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c016b573>] cache_alloc_refill+0x363/0x710 > [<c03a52c4>] __alloc_skb+0x34/0x120 > [<c03a539e>] __alloc_skb+0x10e/0x120 > [<c016ba6e>] __kmalloc_track_caller+0x14e/0x160 > [<c016bc53>] kmem_cache_alloc+0x73/0xf0 > [<c03a5da9>] dev_alloc_skb+0x19/0x30 > [<c03a52e5>] __alloc_skb+0x55/0x120 > [<c03a5da9>] dev_alloc_skb+0x19/0x30 > [<c02ced8e>] boomerang_rx+0x15e/0x520 > [<c02d04cf>] boomerang_interrupt+0x13f/0x480 > [<e109d6a9>] budget_ci_irq+0xa9/0x100 [budget_ci] > [<c0103d9c>] apic_timer_interrupt+0x28/0x30 > [<c0146348>] handle_IRQ_event+0x28/0x50 > [<c0147600>] handle_level_irq+0x0/0xb0 > [<c014764b>] handle_level_irq+0x4b/0xb0 > <IRQ> [<c0103d6f>] common_interrupt+0x23/0x28 > [<c024007b>] prio_tree_right+0xab/0x100 > [<c02442f7>] delay_tsc+0x17/0x20 > [<c0244298>] __const_udelay+0x18/0x20 > [<c04c304a>] panic+0x84/0xe3 > [<c010584c>] oops_end+0x7c/0x90 > [<c01045d0>] do_invalid_op+0x0/0xa0 > [<c0104651>] do_invalid_op+0x81/0xa0 > [<c016a8a3>] cache_free_debugcheck+0x203/0x250 > [<c011d233>] __wake_up_common+0x43/0x70 > [<c04c4b82>] error_code+0x6a/0x70 > [<c016a8a3>] cache_free_debugcheck+0x203/0x250 > [<c01483e2>] __rcu_process_callbacks+0xd2/0x1f0 > [<c016a911>] kmem_cache_free+0x21/0x60 > [<c01483e2>] __rcu_process_callbacks+0xd2/0x1f0 > [<c014850f>] rcu_process_callbacks+0xf/0x20 > [<c0127a37>] __do_softirq+0x57/0xf0 > [<c01279e0>] __do_softirq+0x0/0xf0 > <IRQ> [<c01277e5>] irq_exit+0x45/0x70 > [<c0112590>] smp_apic_timer_interrupt+0x40/0x70 > [<c0103d9c>] apic_timer_interrupt+0x28/0x30 > Mem-Info: > DMA per-cpu: > CPU 0: hi: 0, btch: 1 usd: 0 > Normal per-cpu: > CPU 0: hi: 186, btch: 31 usd: 174 > Active_anon:13626 active_file:3702 inactive_anon:11682 > inactive_file:91928 unevictable:5 dirty:48 writeback:0 unstable:0 > free:737 slab:3377 mapped:2606 pagetables:219 bounce:0 > DMA free:2004kB min:84kB low:104kB high:124kB active_anon:24kB inactive_anon:28kB active_file:104kB inactive_file:8164kB unevictable:0kB present:15872kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 492 492 > Normal free:944kB min:2792kB low:3488kB high:4188kB active_anon:54480kB inactive_anon:46700kB active_file:14704kB inactive_file:359548kB unevictable:20kB present:503928kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 > DMA: 1*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2004kB > Normal: 0*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 944kB > 95760 total pagecache pages > 0 pages in swap cache > Swap cache stats: add 0, delete 0, find 0/0 > Free swap = 530104kB > Total swap = 530104kB > 131070 pages RAM > 2635 pages reserved > 10978 pages shared > 121856 pages non-shared > named: page allocation failure. order:0, mode:0x20 > Pid: 1885, comm: named Tainted: G D 2.6.28-rc8-00519-g90435df #42 > Call Trace: > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c016b573>] cache_alloc_refill+0x363/0x710 > [<c03a52c4>] __alloc_skb+0x34/0x120 > [<c016bcc1>] kmem_cache_alloc+0xe1/0xf0 > [<c03a52c4>] __alloc_skb+0x34/0x120 > [<c03b739b>] refill_skbs+0x5b/0x70 > [<c03b81e9>] find_skb+0x19/0x90 > [<c0266d90>] bit_cursor+0x0/0x610 > [<c03b840e>] netpoll_send_udp+0x2e/0x200 > [<e33661ad>] write_msg+0x9d/0xe0 [netconsole] > [<e3366110>] write_msg+0x0/0xe0 [netconsole] > [<c0123443>] __call_console_drivers+0x43/0x50 > [<c01238bb>] release_console_sem+0x13b/0x1c0 > [<c0123dd7>] vprintk+0x227/0x2d0 > [<c0123443>] __call_console_drivers+0x43/0x50 > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c04c30c0>] printk+0x17/0x1f > [<c0105909>] print_trace_address+0x49/0x60 > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c01059a4>] dump_trace+0x84/0x100 > [<c0105fde>] show_trace+0x4e/0x60 > [<c04c2fc1>] dump_stack+0x6e/0x73 > [<c01505cd>] __alloc_pages_internal+0x35d/0x470 > [<c016b573>] cache_alloc_refill+0x363/0x710 > [<c03a52c4>] __alloc_skb+0x34/0x120 > [<c03a539e>] __alloc_skb+0x10e/0x120 > [<c016ba6e>] __kmalloc_track_caller+0x14e/0x160 > [<c016bc53>] kmem_cache_alloc+0x73/0xf0 > [<c03a5da9>] dev_alloc_skb+0x19/0x30 > [<c03a52e5>] __alloc_skb+0x55/0x120 > [<c03a5da9>] dev_alloc_skb+0x19/0x30 > [<c02ced8e>] boomerang_rx+0x15e/0x520 > [<c02d04cf>] boomerang_interrupt+0x13f/0x480 > [<e109d6a9>] budget_ci_irq+0xa9/0x100 [budget_ci] > [<c0103d9c>] apic_timer_interrupt+0x28/0x30 > [<c0146348>] handle_IRQ_event+0x28/0x50 > [<c0147600>] handle_level_irq+0x0/0xb0 > [<c014764b>] handle_level_irq+0x4b/0xb0 > <IRQ> [<c0103d6f>] common_interrupt+0x23/0x28 > [<c024007b>] prio_tree_right+0xab/0x100 > [<c02442f7>] delay_tsc+0x17/0x20 > [<c0244298>] __const_udelay+0x18/0x20 > [<c04c304a>] panic+0x84/0xe3 > [<c010584c>] oops_end+0x7c/0x90 > [<c01045d0>] do_invalid_op+0x0/0xa0 > [<c0104651>] do_invalid_op+0x81/0xa0 > [<c016a8a3>] cache_free_debugcheck+0x203/0x250 > [<c011d233>] __wake_up_common+0x43/0x70 > [<c04c4b82>] error_code+0x6a/0x70 > [<c016a8a3>] cache_free_debugcheck+0x203/0x250 > [<c01483e2>] __rcu_process_callbacks+0xd2/0x1f0 > [<c016a911>] kmem_cache_free+0x21/0x60 > [<c01483e2>] __rcu_process_callbacks+0xd2/0x1f0 > [<c014850f>] rcu_process_callbacks+0xf/0x20 > [<c0127a37>] __do_softirq+0x57/0xf0 > [<c01279e0>] __do_softirq+0x0/0xf0 > <IRQ> [<c01277e5>] irq_exit+0x45/0x70 > [<c0112590>] smp_apic_timer_interrupt+0x40/0x70 > [<c0103d9c>] apic_timer_interrupt+0x28/0x30 > Mem-Info: > DMA per-cpu: > CPU 0: hi: 0, btch: 1 usd: 0 > Normal per-cpu: > CPU 0: hi: 186, btch: 31 usd: 174 > Active_anon:13626 active_file:3702 inactive_anon:11682 > inactive_file:91928 unevictable:5 dirty:48 writeback:0 unstable:0 > free:737 slab:3377 mapped:2606 pagetables:219 bounce:0 > DMA free:2004kB min:84kB low:104kB high:124kB active_anon:24kB inactive_anon:28kB active_file:104kB inactive_file:8164kB unevictable:0kB present:15872kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 492 492 > Normal free:944kB min:2792kB low:3488kB high:4188kB active_anon:54480kB inactive_anon:46700kB active_file:14704kB inactive_file:359548kB unevictable:20kB present:503928kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 > DMA: 1*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2004kB > Normal: 0*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 944kB > 95760 total pagecache pages > 0 pages in swap cache > Swap cache stats: add 0, delete 0, find 0/0 > Free swap = 530104kB > Total swap = 530104kB > 131070 pages RAM > 2635 pages reserved > 10978 pages shared > 121856 pages non-shared > named: page allocation failure. order:0, mode:0x20 > [...] > -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html