On 2/3/14, Ahmed A <ahmedcali@xxxxxxxxx> wrote: > Hello, > > I have a server with onboard Intel 10G ports (82599). When I load the kernel > module driver for these ports, everything is fine, I can see the newly > created ethX devices using "ip addr show". However, after I assign an IP > address, and right after I issue command to bring up the port, I get a > kernel panic related to DMAR (DMA remapping) in the VFIO (Virtual Function > IO) module. I am not even > sure why I am getting this panic since this Intel kernel module does not > use VFIO. I know why the panic is happening, NULL being sent as a > parameter to function vfio_group_get(), in which it is being de-referenced. > I > know NULL is passed, because register RDI, which is used to pass the > first argument to a function, contains 0. > > Linux kernel 3.6.11 > > Following is the stack trace of panic: You have to post your message to kvm@xxxxxxxxxxxxxxx and CC Alexey Kardashevskiy <aik@xxxxxxxxx> who did add that function. > > # [11036.855410] BUG: unable to handle kernel [11036.887249] ixgbe > 0000:84:00.0: eth6: detected SFP+: 3 > NULL pointer dereference at (null) > [11037.010224] IP: [<ffffffffa006615a>] vfio_group_get+0x9/0x27 [vfio] > [11037.085047] PGD 1fd6b5b067 PUD 20404b1067 PMD 0 > [11037.140181] Oops: 0000 [#1] SMP > [11037.178676] Modules linked in: ixgbe(O) nfsv3 autofs4 nfsd nfs_acl nfs > lockd sunrpc vfio_pci vfio_iommu_type1 vfio i2c_mux i2c_smbus i2c_dev > container ide_pci_generic ide_core uhci_hcd isci ata_generic > [11037.393137] CPU 0 > [11037.414974] Pid: 14045, comm: kworker/0:0 Tainted: G O 3.6.11 > [11037.539628] RIP: 0010:[<ffffffffa006615a>] [<ffffffffa006615a>] > vfio_group_get+0x9/0x27 [vfio] > [11037.643521] RSP: 0018:ffff881f52453d00 EFLAGS: 00010282 > [11037.706886] RAX: ffff881fd6740680 RBX: 0000000000000000 RCX: > ffff88204157ec00 > [11037.792053] RDX: 0000000000000084 RSI: 0000000001f5327a RDI: > 0000000000000000 > [11037.877221] RBP: ffff881f52453d10 R08: ffff881f5327abe0 R09: > 0000000000000000 > [11037.962394] R10: 0000000000000000 R11: 0000000000000000 R12: > ffff88204157f800 > [11038.024995] ixgbe 0000:84:00.0: eth6: NIC Link is Up 10 Gbps, Flow > Control: RX/TX > [11038.025144] IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready > [11038.211671] R13: 0000000000000084 R14: 0000000000000000 R15: > 0000000000000000 > [11038.296842] FS: 0000000000000000(0000) GS:ffff88204f000000(0000) > knlGS:0000000000000000 > [11038.393430] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [11038.461988] CR2: 0000000000000000 CR3: 0000001fd686d000 CR4: > 00000000001407f0 > [11038.547156] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [11038.632326] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [11038.717496] Process kworker/0:0 (pid: 14045, threadinfo ffff881f52452000, > task ffff882034d61950) > [11038.822392] Stack: > [11038.846298] 0000000000000084 ffff881fd6740680 ffff881f52453d30 > ffffffffa006618a > [11038.934688] 0000000001f5327a ffff882035e23e00 ffff881f52453d50 > ffffffffa0066442 > [11039.023078] ffff881f52453d70 ffff881fd6740680 ffff881f52453d70 > ffffffffa0072072 > [11039.111465] Call Trace: > [11039.140571] [<ffffffffa006618a>] vfio_device_get+0x12/0x30 [vfio] > [11039.214324] [<ffffffffa0066442>] vfio_device_get_from_dev+0x19/0x1f > [vfio] > [11039.297425] [<ffffffffa0072072>] vfio_pci_dmar_error_handler+0x13/0x4a > [vfio_pci] > [11039.387796] [<ffffffff81420cc6>] dmar_fault_do_one+0xd4/0xf1 > [11039.456366] [<ffffffff8104175d>] process_one_work+0x1c2/0x311 > [11039.525968] [<ffffffff81041568>] ? manage_workers+0x23a/0x24c > [11039.595566] [<ffffffff81420bf2>] ? dmar_get_fault_reason+0x52/0x52 > [11039.670354] [<ffffffff81041b42>] worker_thread+0x26c/0x34a > [11039.736840] [<ffffffff810418d6>] ? process_scheduled_works+0x2a/0x2a > [11039.813710] [<ffffffff8104583a>] kthread+0x86/0x8e > [11039.871891] [<ffffffff81604bf4>] kernel_thread_helper+0x4/0x10 > [11039.942524] [<ffffffff810457b4>] ? > kthread_freezable_should_stop+0x4d/0x4d > [11040.025618] [<ffffffff81604bf0>] ? gs_change+0xb/0xb > [11040.085865] Code: 48 8b 00 48 8b 40 20 48 85 c0 74 0c 55 48 8b 7f 40 48 > 89 e5 ff d0 eb 08 48 c7 c0 ea ff ff ff c3 5d c3 55 48 89 e5 53 48 89 fb 52 > <8b> 07 85 c0 75 11 be 2a 00 00 00 48 c7 c7 38 76 06 a0 e8 32 84 > [11040.312869] RIP [<ffffffffa006615a>] vfio_group_get+0x9/0x27 [vfio] > [11040.388722] RSP <ffff881f52453d00> > [11040.430282] CR2: 0000000000000000 > > > > - Can someone please help me understand the damr/vfio related function calls > in the back trace, and why they are getting invoked? I know what causes > DMAR error, but not sure how this could be happening, since none of the > devices is managed by VFIO. > > - Looking at the source code, it seems dmar_fault_do_one() is called from > interrupt handler dmar_fault(). I am just curious, why dmar_fault() is not > part of the stack trace? > - What is the significance of the "?" in front of some of the functions in > the backtrace (e.g. dmar_get_fault_reason()). > > Thank you, > Ahmed. > -- Regards, Denis _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies