Re: kernel oopses when using modules

"Ryan Sims" <rwsims@xxxxxxxxx> · Tue, 11 Dec 2007 07:17:02 -0500

On Dec 11, 2007 5:07 AM, Roman Kyrylych <roman.kyrylych@xxxxxxxxx> wrote:
> 2007/12/11, Ryan Sims <rwsims@xxxxxxxxx>:
>
> > I posted[1] to the forums about this when I thought it was an nvidia
> > problem, but now it seems to be more general.  I recently upgraded to
> > kernel26-2.6.23.9-1 from 2.6.23.8-1, and now I get oopses when certain
> > modules are accessed.  For example:
> >
> > BUG: unable to handle kernel NULL pointer dereference at virtual
> > address 0000000b
> >  printing eip:
> > c016c4da
> > *pde = 00000000
> > Oops: 0000 [#1]
> > PREEMPT SMP
> > Modules linked in: ext2 w83627ehf hwmon_vid ipv6 ohci1394 ieee1394
> > firewire_ohci firewire_core crc_itu_t tsdev usbhid hid ff_memless
> > usb_storage ide_core intel_agp agpgart ppp_generic sky2 sg evdev
> > thermal processor fan button battery ac kqemu i2c_i801 i2c_dev
> > i2c_core coretemp snd_hda_intel snd_pcm snd_timer snd_page_alloc
> > snd_hwdep snd soundcore slhc skge rtc ext3 jbd mbcache sd_mod sr_mod
> > cdrom ehci_hcd uhci_hcd usbcore ahci ata_generic pata_jmicron libata
> > CPU:    1
> > EIP:    0060:[<c016c4da>]    Not tainted VLI
> > EFLAGS: 00210206   (2.6.23-ARCH #1)
> > EIP is at find_vma+0xa/0x70
> > eax: 00000003   ebx: af09d000   ecx: af09d000   edx: af09d000
> > esi: 00000003   edi: af09d000   ebp: f5d94000   esp: f490de1c
> > ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> > Process qemu (pid: 7434, ti=f490c000 task=f5d94000 task.ti=f490c000)
> > Stack: 00000114 af09d000 c016cc1d 00000114 f9590000 af09d000 c016b084 f5d94000
> >        00000003 00000003 00000000 00000022 f9590000 f9590000 f9590000 00000114
> >        f9590000 f9590000 00000002 f935345f 00000001 00000001 00000000 f490de80
> > Call Trace:
> >  [<c016cc1d>] find_extend_vma+0x1d/0x70
> >  [<c016b084>] get_user_pages+0x44/0x2d0
> >  [<f935345f>] kqemu_lock_user_page+0x3f/0x80 [kqemu]
> >  [<f93549d7>] mon_user_map+0xe7/0x110 [kqemu]
> >  [<f93552cb>] kqemu_init+0x7eb/0xe20 [kqemu]
> >  [<c016ade1>] handle_mm_fault+0x501/0x760
> >  [<c016e051>] mmap_region+0x311/0x440
> >  [<f93531b9>] kqemu_ioctl+0x109/0x120 [kqemu]
> >  [<c018ae58>] do_ioctl+0x78/0x90
> >  [<c018b09e>] vfs_ioctl+0x22e/0x2b0
> >  [<c018b17d>] sys_ioctl+0x5d/0x70
> >  [<c0104482>] sysenter_past_esp+0x6b/0xa1
> >  [<c0360000>] wait_for_completion+0x30/0xa0
> >  =======================
> > Code: 00 89 d1 8b 50 20 39 ca 73 05 89 48 20 89 ca 8b 48 14 39 d1 73
> > 03 89 48 20 f3 c3 8d b6 00 00 00 00 56 85 c0 53 89 c6 89 d3 74 51 <8b>
> > 50 08 85 d2 74 05 39 5a 08 77 35 8b 4e 04 85 c9 74 3e 31 d2
> > EIP: [<c016c4da>] find_vma+0xa/0x70 SS:ESP 0068:f490de1c
> >
> > I don't want to waste the bandwidth, but I have another one very much
> > like it for nvidia trying to run opengl stuff.  The dmesg above is the
> > kqemu module and qemu.  I'd suspect hardware, but I don't have any
> > other reason to, and everything was working fine before the upgrade.
> > Also, it's only these two modules (so far) and the BUG always happens
> > at the same place, which seems a little too deterministic to be heat
> > issues.  I'm not familiar enough with the kernel changelog to be able
> > to narrow things down that way.  Any help?
>
> Exactly this was a problem with virtualbox-modules (vboxdrv)
> All modules need to be rebuilt against .9 kernel.
> Make sure you have the latest drivers. It seems the problem is in
> kqemu (wasn't it rebuilt in our repos?).
>
> --
> Roman Kyrylych (Роман Кирилич)
>
Thanks for the response, but I found the problem.  It's much simpler
than that: pebkac.  While tearing my hair out last night and getting
ready to hack away at PKGBUILDs until 3, I found this in my pacman
log:

WARNING: /boot appears to be a seperate partition but is not mounted
This is most likely not what you want.  Please mount your /boot
partition and reinstall the kernel unless you are sure this is OK

And that's when I remembered marking /boot as "noauto" in fstab, and
of *course* I didn't remember to remount before upgrading my kernel.
Well, a couple of rescue cd boots and mkinitcpio hackery later, all is
well again.  Mea culpa, sorry for the static.

-- 
Ryan W Sims