On Thu, Jul 31, 2008 at 12:15 PM, Pekka J Enberg <penberg@xxxxxxxxxxxxxx> wrote: > Hi Andrew, > > On Wed, 30 Jul 2008, Andrew Morton wrote: >> > Ok here it is. >> > BTW, I run "klogd -c 7" after boot >> >> The sysrq output is still missing lots of stuff. I guess we broke it. >> >> > >> > This time I get a kmalloc poison overwritten: >> > >> >> argh, that stuff hurts my brain. None of the numbers seem to make any >> sense for a 4k allocation :( Pekka, do you have time to decrypt this? > > Sure. Here goes: > > On Wed, 30 Jul 2008, Andrew Morton wrote: >> <fixes wordwrapping, cleans stuff up> >> >> ============================================================================= >> BUG kmalloc-4096: Poison overwritten >> ----------------------------------------------------------------------------- >> >> INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b > > That's POISON_FREE ("0x6b") overwritten which means use-after-free for > the range of 0xf6f3a080 - 0xf6f3a0ef (112 bytes). The rest of the > object is okay but the SLUB debugging code only dumps the first 128 bytes > of the object which is why we don't see the full corruption. > > 2.6.27-rc1 should dump the full object so I'm assuming this is pre -rc1? > >> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0 >> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398 > > So the corrupted object was free'd by skb_release_data() so we need to > look for a driver or the networking stack calling that function too early. > >> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3 >> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030 >> >> Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ > > The object starts here (the poison values for first 32 bytes are okay): > >> Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk >> Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > > And here are the first 96 bytes of the corruption: > >> Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@ >> Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@xxx!.z!... >> Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!............. >> Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$......... >> Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H` >> Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{.......... > > But I think that's just the payload of a SKB? > >> Redzone 0xf6f3b060: bb bb bb bb »»»» > > The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces > use-after-free. > >> Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ >> Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2 >> [<c0180f5d>] print_trailer+0xad/0xf0 >> [<c018103b>] check_bytes_and_report+0x9b/0xc0 >> [<c018145e>] check_object+0x19e/0x1e0 >> [<c01821a4>] __slab_alloc+0x454/0x4f0 >> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0 >> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 >> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 >> [<c03dce79>] __alloc_skb+0x49/0x100 >> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30 >> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k] >> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k] >> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0 >> [<c012eafc>] tasklet_action+0x4c/0xc0 >> [<c012e463>] __do_softirq+0x93/0x120 >> [<c012e547>] do_softirq+0x57/0x60 >> [<c012ea29>] irq_exit+0x69/0x80 >> [<c0106b55>] do_IRQ+0x45/0x80 >> [<c010a5d0>] ? mwait_idle+0x0/0x50 >> [<c0104752>] common_interrupt+0x2e/0x34 >> [<c010a5d0>] ? mwait_idle+0x0/0x50 >> [<c010a609>] ? mwait_idle+0x39/0x50 >> [<c01026e0>] cpu_idle+0x60/0xd0 >> [<c043c8ce>] rest_init+0x4e/0x60 >> ======================= >> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b >> >> FIX kmalloc-4096: Marking all objects used >> [<c0243b4f>] ? security_file_permission+0xf/0x20 >> [<c019436f>] sys_select+0x3f/0x190 >> [<c01878e9>] ? fput+0x19/0x20 >> [<c0103dbf>] ? restore_nocheck+0x12/0x15 >> [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140 >> [<c0103d5e>] syscall_call+0x7/0xb >> ======================= >> >> Dave, could you please remind us which net driver was in use here? > > There's ath5k in the stack trace but that, of course, doesn't > automatically mean it's at fault here. It could have been just the poor > bastard who was the next to allocate 4 KB with kmalloc() noticing the > corruption. > > Hope this helps! I've seen something similar with fragmentation code in iwl4965 but I can reproduce it only when using SLAB. With SLUB it didn't shown up. Does anyone know what is difference between SLAB and SLUB in this context.? Tomas -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html