On Sat, 2011-03-05 at 16:59 -0500, Andy Walls wrote: > On Thu, 2011-03-03 at 21:06 -0500, Andy Walls wrote: > > Hi, > > > > I got a BUG when loading the cx18.ko module (which in turn requests the > > cx18-alsa.ko module) on a kernel built from this repository > > > > http://git.linuxtv.org/media_tree.git staging/for_v2.6.39 > > > > which I beleive is based on 2.6.38-rc2. > > [snip] > > > So here is my transcription of a fuzzy digital photo of the screen: > > > > kernel BUG at /home/andy/cx18dev/git/media_tree/mm/mmap.c:2309! > > invalid opcode: 0000 [#1] SMP > > last sysfs file: /sys/module/snd_pcm/initstate > > Modules linked in: tda9887 tda8290 mxl5005s s5h1409 tuner_simple ... > > ... > > Pid: 2580, comm: udevd Not tainted 2.6.38-rc2-cx18-vb2-proto+ > > RIP: 0010:[<ffffffff810eb50b>] [<ffffffff810eb50b>] exit_mmap+0x10f/0x11e > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0020000000000000 > > RDX: 0000000000160011 RSI: ffffea____c42___ RDI: 0000000000000202 > > RBP: ffff____18c1f_58 R08: ffff____________ R09: 0000000000000004 > > R10: ffff_______bb_38 R11: 0000000000000000 R12: ffff____344a6680 > > R13: 00007fff22______ R14: ffff____________ R15: 0000000000000001 > > ... > > CR2: 0000000000000000 ... > > .... > > Process udevd (pid: 25__, threadinfo ffff________, ... > > Stack: > > 000000000000015e ffff00003bc0e1d0 0000000000000246 .... > > ..... > > Call Trace: > > ... mmput+0x63/0xcf > > ... exit_mm+0x132/0x13f > > ... do_exit+0x238/0x749 > > ... ? __dequeue_signal+0xfa/0x12f > > ... do_group_exit+0x7d/0xa5 > > ... get_signal_to_deliver+0x371/0x395 > > ... do_signal+0x72/0x692 > > ... ? do_page_fault+0x24a/0x391 > > ... ? printk+0x41/0x47 > > ... ? sigprocmask+0xa3/0xcd > > ... do_notify_resume+0x2c/0x64 > > ... retint_signal+0x48/0x8c > > > > Code: ff ff 48 8b 7d d8 4c 89 ea 31 f6 e8 3e fe ff ff 48 89 df e8 78 fe > > ff ff 48 85 c0 48 89 c3 75 f0 49 83 bc 24 e0 00 00 00 00 74 04 <0f> 0b > > eb fe 48 83 c4 18 5b 41 5c 41 5d c9 c3 55 48 89 e5 41 57 > > RIP [<ffffffff810eb50b>] exit_mmap+0x10f/0x11e > > RSP <ffff880018c1fc28> > > general protection fault: 0000 [#2] SMP > > last sysfs file: /sys/devices/virtual/sound/card2/uevent > > CPU 1 > > Modules linked in: cx18-alsa tda9887 tda8290 mxl5005s s5h1409 > > tuner_simple tuner_types cs5345 tuner cx18 dvb_core cx2341x v4l2_common > > videodev v4l2_compat_ioctl32 > > > I'm dumping all my previous assumtpions about this BUG. After a bit of > reading, all I can say is that it's a page table deallocation problem at > process exit. After all the page table deallocations on exit, > mm->nr_ptes is still > 0, and that's a bad thing. > > It apparently happened in a child udevd exiting shortly after cx18.ko > loaded. The cx18 driver allocating large amounts kernel memory for DMA > buffers upon load may be related to triggering the problem, but I doubt > it is a root cause of the BUG. > > > This monsterous thread from 5 years ago is somewhat enlightening: > > http://lkml.indiana.edu/hypermail/linux/kernel/0503.2/1680.html > http://lkml.indiana.edu/hypermail/linux/kernel/0503.2/1787.html > > so it gives me a place to start looking for the problem. > > Any advice on what data to collect is appreciated. When attemtping to reproduce this BUG, I got another bug related to memory management: (Details handtyped from a photo): BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff010f22fa>] remove_vm_area+0x42/0x77 PGD 37cdd067 PUD 336c__67 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:14.4/0000:03:00.0/firmware/0000:03:00.0/loading CPU 0 Modules linked in: tda9887 tda8290 mxl5005s s5h1409 tuner_simple tuner_types cx5345 tuner cx18(+) dvb_core cx2341x ... Pid: 2470, comm: work_for_cpu Tainted: G W 2.6.28-rc2-cx18-vb2-proto+ RIP: 0010:[<ffffffff010f22fa>] [<ffffffff010f22fa>] remove_vm_area+0x42/0x77 ... RAX: 0000000000000000 RBX: ffff____35e7c540 RCX: 0000000000001000 RDX: 0000000000000000 .... ... CR2: 0000000000000000 .... Stack: ffff__0011485968 000000000000001 ffff____1147dc9_ ffffffff_1_f23__ .... Call Trace: ... __vunmap+0x3e/0xbd ... vfree+0x2e/0x30 ... dvb_dmx_init+0x7e/0x253 [dvb_core] ... cx18_dvb_register+0xd2/0x75c [cx18] ... cx18_streams_resgister+0x6a/0x26a [cx18] ... cx18_streams_setup+0x3cc/0x486 [cx18] ... cx18_probe+0x11cc/0x12fb [cx18] ...... The code appears to be failing here: /home/andy/cx18dev/git/media_tree/mm/vmalloc.c:1352 161d: eb 06 jmp 1625 <remove_vm_area+0x45> 161f: 48 89 c2 mov %rax,%rdx 1622: 48 8b 00 mov (%rax),%rax <--- Oops p = &tmp->next) (tmp = *p) 1625: 48 39 d8 cmp %rbx,%rax (tmp = *p) != vm; 1628: 75 f5 jne 161f <remove_vm_area+0x3f> /home/andy/cx18dev/git/media_tree/mm/vmalloc.c:1354 Corresponding to this code in mm/vmalloc.c: struct vm_struct *remove_vm_area(const void *addr) { struct vmap_area *va; va = find_vmap_area((unsigned long)addr); if (va && va->flags & VM_VM_AREA) { struct vm_struct *vm = va->private; struct vm_struct *tmp, **p; /* * remove from list and disallow access to this vm_struct * before unmap. (address range confliction is maintained by * vmap.) */ write_lock(&vmlist_lock); for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next) <--- Ooops ; [...] That for() loop appears to assume the vm_struct will be on the vmlist somewhere. If it isn't, then I suppose the for() loop could end up doing a NULL dereference. This BUG happened in the final stages of the cx18 driver setting up a CX23418 card instance. I have 2 cards in this machine, so a number of buffers had certainly been allocated using kmalloc(). The code in the dvb_core that is failing got BUG'ed in this case was this: int dvb_dmx_init(struct dvb_demux *dvbdemux) { int i; struct dmx_demux *dmx = &dvbdemux->dmx; dvbdemux->cnt_storage = NULL; dvbdemux->users = 0; dvbdemux->filter = vmalloc(dvbdemux->filternum * sizeof(struct dvb_demux_filter)); if (!dvbdemux->filter) return -ENOMEM; dvbdemux->feed = vmalloc(dvbdemux->feednum * sizeof(struct dvb_demux_feed)); if (!dvbdemux->feed) { vfree(dvbdemux->filter); <------- BUG/Oops happened in this call dvbdemux->filter = NULL; return -ENOMEM; } ... Which is kind of interesting: 1. The first vmalloc() succeeded. 2. The second vmalloc() failed. 3. The vfree() of the pointer from the first vmalloc() caused an Oops/BUG. I'm not sure where to go from here. Regards, Andy > Regards, > Andy > > -- > To unsubscribe from this list: send the line "unsubscribe linux-media" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html