Hi. I'm a developer on the M5 simulator (m5sim.org) working on a CPU model which uses kvm as its execution engine. I ran into a kernel "BUG" where a NULL pointer is being dereferenced in gfn_to_rmap. What's happening on the kernel side is that gfn_to_rmap is calling gfn_to_memslot. That function looks for the gfn in the memory slots, fails to find it, and returns a NULL pointer. gfn_to_rmap then tries to dereference it, and the kernel kills itself. I believe the original source of the call to gfn_to_memslot was mmu_alloc_roots (in 2.6.28.9, it may have moved) which tries to get the page pointed to by CR3 using kvm_mmu_get_page. That part may not be correct, so here's the log output from the kernel. May 15 18:54:46 fajita BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 May 15 18:54:46 fajita IP: [<ffffffff802127b3>] gfn_to_rmap+0x17/0x48 May 15 18:54:46 fajita PGD 136051067 PUD 1299fd067 PMD 0 May 15 18:54:46 fajita Oops: 0000 [#1] SMP May 15 18:54:46 fajita last sysfs file: /sys/power/state May 15 18:54:46 fajita CPU 0 May 15 18:54:46 fajita Modules linked in: snd_hda_intel nvidia(P) snd_pcm snd_timer snd iwlagn snd_page_alloc May 15 18:54:46 fajita Pid: 7325, comm: m5.opt Tainted: P 2.6.28.9 #2 May 15 18:54:46 fajita RIP: 0010:[<ffffffff802127b3>] [<ffffffff802127b3>] gfn_to_rmap+0x17/0x48 May 15 18:54:46 fajita RSP: 0018:ffff880129963cf8 EFLAGS: 00010246 May 15 18:54:46 fajita RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 May 15 18:54:46 fajita RDX: 0000000000000000 RSI: 0000000000000070 RDI: ffff8801268d8000 May 15 18:54:46 fajita RBP: 0000000000000070 R08: 000000000000000a R09: 0000000000000000 May 15 18:54:46 fajita R10: 000000000000008b R11: 0000000000000002 R12: 0000000000000070 May 15 18:54:46 fajita R13: 0000000000000000 R14: 000000000000ae80 R15: 0000000000000070 May 15 18:54:46 fajita FS: 0000000041e1d950(0063) GS:ffffffff80ab2040(0000) knlGS:0000000000000000 May 15 18:54:46 fajita CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 15 18:54:46 fajita CR2: 0000000000000000 CR3: 0000000129909000 CR4: 00000000000026e0 May 15 18:54:46 fajita DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 15 18:54:46 fajita DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 15 18:54:46 fajita Process m5.opt (pid: 7325, threadinfo ffff880129962000, task ffff88013a1eacd0) May 15 18:54:46 fajita Stack: May 15 18:54:46 fajita ffff88013aba6800 ffff8801299727b0 ffff8801268d8000 ffffffff80213abe May 15 18:54:46 fajita 00000000000080d0 ffff8801299727b0 ffff88012f040590 00000000000e0044 May 15 18:54:46 fajita ffff880129972040 ffffffff80213eeb ffff88013b282380 0000000000000246 May 15 18:54:46 fajita Call Trace: May 15 18:54:46 fajita [<ffffffff80213abe>] ? rmap_write_protect+0x25/0x123 May 15 18:54:46 fajita [<ffffffff80213eeb>] ? kvm_mmu_get_page+0x2cb/0x320 May 15 18:54:46 fajita [<ffffffff80214f51>] ? kvm_mmu_load+0x80/0x1b1 May 15 18:54:46 fajita [<ffffffff806db286>] ? __down_read+0x12/0x93 May 15 18:54:46 fajita [<ffffffff8020fc9c>] ? kvm_arch_vcpu_ioctl_run+0x1ce/0x621 May 15 18:54:46 fajita [<ffffffff8020b590>] ? kvm_vcpu_ioctl+0xf2/0x448 May 15 18:54:46 fajita [<ffffffff80287a8d>] ? handle_mm_fault+0x367/0x6dd May 15 18:54:46 fajita [<ffffffff802ae03e>] ? vfs_ioctl+0x21/0x6b May 15 18:54:46 fajita [<ffffffff802ae402>] ? do_vfs_ioctl+0x37a/0x3c1 May 15 18:54:46 fajita [<ffffffff806dd616>] ? do_page_fault+0x444/0x806 May 15 18:54:46 fajita [<ffffffff80407353>] ? __up_write+0x21/0x10e May 15 18:54:46 fajita [<ffffffff802ae485>] ? sys_ioctl+0x3c/0x5c May 15 18:54:46 fajita [<ffffffff802234db>] ? system_call_fastpath+0x16/0x1b May 15 18:54:46 fajita Code: 26 21 80 48 89 f3 e8 33 ff ff ff 48 89 df 5b e9 c0 fe ff ff 55 48 89 f5 53 89 d3 48 83 ec 08 e8 60 78 ff ff 85 db 48 89 c1 75 11 <48> 2b 28 48 8d 14 ed 00 00 00 00 48 03 50 18 eb 19 48 8b 00 48 May 15 18:54:46 fajita RIP [<ffffffff802127b3>] gfn_to_rmap+0x17/0x48 May 15 18:54:46 fajita RSP <ffff880129963cf8> May 15 18:54:46 fajita CR2: 0000000000000000 May 15 18:54:46 fajita ---[ end trace 61dc41d5d0f7fc5f ]--- I looked in your git repository and this bug seems to be present in your most recent code. The second problem was the fact that CR3 didn't point to any memory even though it had a valid value (0x7000). This was because our code relied on kvm_create to set up physical memory, and while it takes parameters for it and passes them around, it never actually seems to do anything with them. This also seems to be the case in your most recent code. The series of events leading to the BUG were then the following: 1. Our code calls kvm_create to create the vm and create its physical memory, only the first of which happens. 2. Our code tries to start a CPU in that VM from a point where paging is turned on and CR3 has a value that points into the physical memory that doesn't exist. 3. The kernel code tries to get at the reverse mapping for the guest page frame number. 4. Code below that tries to find the "slot" for that address, fails to do so, but continues anyway, causing the kernel to dereference a NULL pointer. 5. Kablooey. I am a full time employee of VMware, and while I work on M5 on my own time, that places certain limits on what I can do to help fix these bugs. While I probably can't implement anything, I should be able to provide more information about what we're doing with M5 or about the crash if that would help. Gabe Black -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html