Christian Lamparter wrote: > On Monday 19 January 2009 19:15:09 Artur Skawina wrote: >> Christian Lamparter wrote: >>> On Monday 19 January 2009 00:27:02 Artur Skawina wrote: >>>> Artur Skawina wrote: >>>>> didn't trigger anything here, just the usual: >>>>> >>>>> BUG kmalloc-4096: Poison overwritten >>>> This is almost 100% reproducible; sometimes the machine freezes instead. >> Here's an interesting sequence: >> >> 1) a TX urb is submitted. >> 2) p54u_rx_cb() => p54_rx_frame_sent(), which does kfree_skb( the_skb_in_(1) ). >> 3) p54u_tx_cb() for (1) is called with the same, now freed, skb. kaboom. >> >> IOW the skb is freed before the usb completion runs. > > Well, the sequence should be: > > 1) p54_tx gets called > 1.1) one IRQ urb is submitted > 1.2) one BULK urb is submitted > 2) the firmware acks that it got the urbs > 2.1) p54u_tx_cb is called for the IRQ urb. which frees the small buffer > 2.2) p54u_tx_cb is called for the BULK urb. which only removes the net2280_tx_hdr from the skb. > [time passes] > 3) firmware is finished sending. > 3.1) p54u_rx_cb gets called > => p54_rx_frame_sent passed the feedback to mac80211 That's what one would expect, and is probably why i couldn't see anything wrong in the code despite going over it several times. Until i got a crash which left no doubt as to what happened, and made me notice the "wrong" completion order, log attached [1]. In theory, theory and practice do not differ, in practice... >> Somehow i don't think this is the reason for the corruption, but it certainly >> seems to be responsible for some, if not all, of the crashes/panics. > dunno... we should see a bit more fallout, because skb_pull changes skb->data and skb->len. Doing an skb_pull in p54u_tx_cb on skbs that have already been given to mac80211 cannot be good. We can move the FREE_AFTER_TX(skb) check from the completion to the submission path, right? Then find a way to do the pull _before_ giving away the skbs. I can't shutdown the machine where i can reproduce this today, so it will have to wait until at least tomorrow. artur [1]: note there are two issues, first the skb_pull triggers a BUG, which is then interrupted by the SLUB error, then the rest of the first BUG follows. i've added '#'-comments explaining some of the numbers. [...] TX: LEN= 20 SKB=d7eb2b60 HEAD=d78f28c0 TX: DONE TX: LEN= 32 SKB=da7c0620 HEAD=de02b3b0 TX: DONE TX: LEN= 436 SKB=d7eb2ee0 HEAD=dd4ee180 TXCB: LEN= 36 SKB=d7eb2b60 HEAD=d78f28c0 TXCB: LEN= 48 SKB=da7c0620 HEAD=de02b3b0 TX: DONE TX: LEN= 436 SKB=dd55e2a0 HEAD=dd4ef670 TX: DONE TX: LEN= 176 SKB=de01e8c0 HEAD=d9812060 TX: DONE TXCB: LEN= 452 SKB=d7eb2ee0 HEAD=dd4ee180 TXCB: LEN= 452 SKB=dd55e2a0 HEAD=dd4ef670 TXCB: LEN= 192 SKB=de01e8c0 HEAD=d9812060 RXCB: LEN= 36 SKB=dd5dfee0 HEAD=d9813090 RXCB: LEN= 36 SKB=dd433700 HEAD=d9811030 RXCB: LEN= 36 SKB=dfbf3a80 HEAD=d9816120 TX: LEN= 160 SKB=dfb2da80 HEAD=d9889030 # (1) urb submission, SKB=dfb2da80 TX: DONE TX: LEN= 168 SKB=ce1388c0 HEAD=d9888000 RXCB: LEN= 36 SKB=ce138d20 HEAD=d98140c0 # (2) this eventually calls p54_rx_frame_sent() and skb_kfree on SKB=dfb2da80 TXCB: LEN=1802201963 SKB=dfb2da80 HEAD=6b6b6b6b # (3) p54u_tx_cb() was called with the now freed and poisoned SKB=dfb2da80 ------------[ cut here ]------------ kernel BUG at include/linux/skbuff.h:1073! # BUG() in skb_pull(), see stacktrace below, after the SLUB error invalid opcode: 0000 [#1] DEBUG_PAGEALLOC last sysfs file: /sys/devices/pci0000:00/0000:00:07.2/usb1/1-1/1-1.1/uevent Modules linked in: p54usb p54common<3>============================================================================= BUG skbuff_head_cache: Poison overwritten ----------------------------------------------------------------------------- INFO: 0xdfb2dad0-0xdfb2dad0. First byte 0x5b instead of 0x6b # 0x10 is the amount that was pulled (net2280 rx header) INFO: Allocated in __alloc_skb+0x34/0x110 age=28 cpu=0 pid=0 INFO: Freed in p54_rx+0x760/0x770 [p54common] age=0 cpu=0 pid=0 # p54_rx+0x760/0x770 is the skb_free in p54_rx_frame_sent() (gcc decided to inline it) INFO: Slab 0xc13f65a0 objects=18 used=16 fp=0xdfb2da80 flags=0x400000c2 INFO: Object 0xdfb2da80 @offset=2688 fp=0xdfb2dee0 # the freed skb Bytes b4 0xdfb2da70: 00 00 00 00 0f 23 00 00 ba e7 1b 02 5a 5a 5a 5a .....#..ºç..ZZZZ Object 0xdfb2da80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2da90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2daa0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2dab0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2dac0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2dad0: 5b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b [kkkkkkkkkkkkkkk Object 0xdfb2dae0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2daf0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2db00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2db10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2db20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xdfb2db30: 6b 6b 6b a5 kkk¥ Redzone 0xdfb2db34: bb bb bb bb »»»» Padding 0xdfb2db5c: 5a 5a 5a 5a ZZZZ Pid: 0, comm: swapper Tainted: G W 2.6.29-rc1-00222-g27956bf-dirty #69 Call Trace: [<c016cf1e>] check_bytes_and_report+0xce/0xf0 [<c016d177>] check_object+0x1e7/0x230 [<c016e915>] __slab_alloc+0x245/0x4f0 [<c03afe04>] __alloc_skb+0x34/0x110 [<c016f0eb>] kmem_cache_alloc+0xbb/0xc0 [<c03afe04>] __alloc_skb+0x34/0x110 [<c03afe04>] __alloc_skb+0x34/0x110 [<c03c5695>] find_skb+0x35/0x90 [<c03c58ae>] netpoll_send_udp+0x2e/0x200 [<e31371ad>] write_msg+0x9d/0xe0 [netconsole] [<e3137110>] write_msg+0x0/0xe0 [netconsole] [<c0122533>] __call_console_drivers+0x43/0x50 [<c01228f7>] release_console_sem+0x87/0x1c0 [<c0122ec7>] vprintk+0x227/0x2d0 [<e3137110>] write_msg+0x0/0xe0 [netconsole] [<c04dcb00>] printk+0x17/0x1f [<c0142ea4>] print_modules+0x74/0xb0 [<c01051ba>] show_registers+0x1a/0x200 [<c0138647>] atomic_notifier_call_chain+0x17/0x20 [<c0138740>] notify_die+0x30/0x40 [<c0105df6>] __die+0xa6/0x100 [<c0106070>] die+0x40/0x80 [<c0103f60>] do_invalid_op+0x0/0xa0 [<c0103fe1>] do_invalid_op+0x81/0xa0 [<c03aced8>] skb_pull+0x18/0x40 [<e3137110>] write_msg+0x0/0xe0 [netconsole] [<c0122533>] __call_console_drivers+0x43/0x50 [<c013ad72>] update_wall_time+0x492/0x8e0 [<c04de722>] error_code+0x6a/0x70 [<c016007b>] sys_mincore+0x26b/0x350 [<c03aced8>] skb_pull+0x18/0x40 [<e1f1d815>] p54u_tx_cb+0x95/0x120 [p54usb] [<c0341489>] usb_hcd_giveback_urb+0x49/0xe0 [<c034fe4f>] uhci_giveback_urb+0x5f/0x1f0 [<c034fe6f>] uhci_giveback_urb+0x7f/0x1f0 [<c035043f>] uhci_scan_schedule+0x33f/0xa40 [<c035278d>] uhci_irq+0x7d/0x140 [<c0341125>] usb_hcd_irq+0x25/0x60 [<c0146de8>] handle_IRQ_event+0x28/0x50 [<c0148000>] handle_level_irq+0x0/0xb0 [<c014804b>] handle_level_irq+0x4b/0xb0 <IRQ> [<c0103707>] common_interrupt+0x27/0x2c [<c0122dbc>] vprintk+0x11c/0x2d0 [<c034301f>] usb_alloc_urb+0xf/0x30 [<c034301f>] usb_alloc_urb+0xf/0x30 [<c04dcb00>] printk+0x17/0x1f [<e1f1d212>] p54u_tx_net2280+0x52/0x320 [p54usb] [<e1d4845d>] p54_tx+0x51d/0x820 [p54common] [<c04ca789>] __ieee80211_tx+0x49/0x130 [<c04cb61d>] ieee80211_master_start_xmit+0x28d/0x610 [<c0137401>] __run_hrtimer+0x31/0xf0 [<c013e894>] tick_program_event+0x14/0x20 [<c013b311>] getnstimeofday+0x51/0x120 [<c03b702b>] dev_hard_start_xmit+0x1db/0x240 [<c03c883b>] __qdisc_run+0x1ab/0x200 [<c03b4dd1>] net_tx_action+0x81/0xb0 [<c0126f07>] __do_softirq+0x57/0xf0 [<c0126eb0>] __do_softirq+0x0/0xf0 <IRQ> [<c0137c38>] ktime_get+0x18/0x40 [<c0148000>] handle_level_irq+0x0/0xb0 [<c0126c65>] irq_exit+0x55/0x70 [<c0104f1b>] do_IRQ+0x9b/0x120 [<c0103707>] common_interrupt+0x27/0x2c [<c0108392>] default_idle+0x42/0x50 [<c0102139>] cpu_idle+0x39/0x80 FIX skbuff_head_cache: Restoring 0xdfb2dad0-0xdfb2dad0=0x6b FIX skbuff_head_cache: Marking all objects used netconsole budget saa7134_empress saa6752hs saa7134 serio_raw budget_ci videobuf_dma_sg budget_core saa7146 ttpci_eeprom videobuf_core tveeprom ir_common [last unloaded: p54common] Pid: 0, comm: swapper Tainted: G W (2.6.29-rc1-00222-g27956bf-dirty #69) EIP: 0060:[<c03aced8>] EFLAGS: 00010087 CPU: 0 EIP is at skb_pull+0x18/0x40 EAX: 6b6b6b5b EBX: 6b6b6b6b ECX: dfb2da80 EDX: 00000010 ESI: dfb2da80 EDI: d744bdd0 EBP: d744b1f0 ESP: c0690ef0 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=c0690000 task=c05fc300 task.ti=c0646000) Stack: d980a370 e1f1d815 e1f1f440 6b6b6b6b dfb2da80 6b6b6b6b d744bf74 00000086 d980a370 00000000 df9fb180 d980a370 c0341489 c034fe4f 00000046 df965de0 df9fb180 df994900 dfa46a68 c034fe6f df9948bc df99491c 00a464d8 df9fb24c Call Trace: [<e1f1d815>] p54u_tx_cb+0x95/0x120 [p54usb] # this is the skb_pull in p54u_tx_cb [<c0341489>] usb_hcd_giveback_urb+0x49/0xe0 [<c034fe4f>] uhci_giveback_urb+0x5f/0x1f0 [<c034fe6f>] uhci_giveback_urb+0x7f/0x1f0 [<c035043f>] uhci_scan_schedule+0x33f/0xa40 [<c035278d>] uhci_irq+0x7d/0x140 [<c0341125>] usb_hcd_irq+0x25/0x60 [<c0146de8>] handle_IRQ_event+0x28/0x50 [<c0148000>] handle_level_irq+0x0/0xb0 [<c014804b>] handle_level_irq+0x4b/0xb0 <IRQ> <0> [<c0103707>] common_interrupt+0x27/0x2c [<c0122dbc>] vprintk+0x11c/0x2d0 [<c034301f>] usb_alloc_urb+0xf/0x30 [<c034301f>] usb_alloc_urb+0xf/0x30 [<c04dcb00>] printk+0x17/0x1f [<e1f1d212>] p54u_tx_net2280+0x52/0x320 [p54usb] [<e1d4845d>] p54_tx+0x51d/0x820 [p54common] [<c04ca789>] __ieee80211_tx+0x49/0x130 [<c04cb61d>] ieee80211_master_start_xmit+0x28d/0x610 [<c0137401>] __run_hrtimer+0x31/0xf0 [<c013e894>] tick_program_event+0x14/0x20 [<c013b311>] getnstimeofday+0x51/0x120 [<c03b702b>] dev_hard_start_xmit+0x1db/0x240 [<c03c883b>] __qdisc_run+0x1ab/0x200 [<c03b4dd1>] net_tx_action+0x81/0xb0 [<c0126f07>] __do_softirq+0x57/0xf0 [<c0126eb0>] __do_softirq+0x0/0xf0 <IRQ> <0> [<c0137c38>] ktime_get+0x18/0x40 [<c0148000>] handle_level_irq+0x0/0xb0 [<c0126c65>] irq_exit+0x55/0x70 [<c0104f1b>] do_IRQ+0x9b/0x120 [<c0103707>] common_interrupt+0x27/0x2c [<c0108392>] default_idle+0x42/0x50 [<c0102139>] cpu_idle+0x39/0x80 Code: 83 c4 0c c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 53 89 c1 8b 58 50 31 c0 39 d3 72 24 89 d8 29 d0 3b 41 54 89 41 50 73 08 <0f> 0b eb fe 8d 74 26 00 8b 99 a8 00 00 00 89 d0 01 d8 89 81 a8 EIP: [<c03aced8>] skb_pull+0x18/0x40 SS:ESP 0068:c0690ef0 ---[ end trace bba3fc5c928ed0e0 ]--- Kernel panic - not syncing: Fatal exception in interrupt -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html