On Fri, 9 Jan 2009 08:58:46 +0800 "Han, Weidong" <weidong.han@xxxxxxxxx> wrote: > >> > >> The oops happens very early during boot in device_to_iommu (called > >> from domain_context_mapping_one). > >> > >> Looking at the code dump and the disassembled function here's where > >> the error happens: > >> > >> static struct intel_iommu *device_to_iommu(u8 bus, u8 devfn) { > >> struct dmar_drhd_unit *drhd = NULL; > >> int i; > >> > >> for_each_drhd_unit(drhd) { > >> if (drhd->ignored) > >> continue; > >> > >> for (i = 0; i < drhd->devices_cnt; i++) > >> if (drhd->devices[i]->bus->number == bus && > >> --> drhd->devices[0] is NULL > >> drhd->devices[i]->devfn == devfn) > >> return drhd->iommu; > >> > >> > >> Given how early this happens it's a little hard to provide logs, > >> etc. I literally used delay_boot=100 and wrote things down by hand > >> (forgot my digital camera) and then added printk's to verify). > >> > >> please let me know what other data I should collect. > > > yes, pls get the call trace. When device_to_iommu() is called, DMAR > should be already parsed from acpi table and registered, so > device_to_iommu() should not fail unless it's called earlier than > DMAR is parsed and registered. I updated to Linus' latest git (as your description made me wonder if the async stuff might play a role here). I still get an oops - but at a different spot and the system no longer hangs - it partly recovers (but things aren't too well - for example my USB keyboard / mouse don't work anymore). Here's the oops: Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.359578] ------------[ cut here ]------------ Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.410579] WARNING: at arch/x86/mm/ioremap.c:240 __ioremap_caller+0x150/0x2bd() Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.461578] Hardware name: 7465CTO Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.512578] Modules linked in: Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.614579] Pid: 1, comm: swapper Not tainted 2.6.28 #12 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.665578] Call Trace: Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.767581] [<ffffffff81038b49>] warn_slowpath+0xb1/0xed Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.869580] [<ffffffff81028319>] ? change_page_attr_set_clr+0x13e/0x2e6 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 12.971580] [<ffffffff810275b2>] __ioremap_caller+0x150/0x2bd Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.073581] [<ffffffff81158363>] ? alloc_iommu+0x140/0x181 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.175580] [<ffffffff810277f2>] ioremap_nocache+0x12/0x14 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.277580] [<ffffffff81158363>] alloc_iommu+0x140/0x181 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.379581] [<ffffffff8166a5d6>] dmar_table_init+0x115/0x265 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.481580] [<ffffffff8165687b>] ? pci_iommu_init+0x0/0x17 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.583580] [<ffffffff8166abb1>] intel_iommu_init+0x16/0x8f3 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.685581] [<ffffffff813ce372>] ? mutex_lock+0x11/0x23 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.787581] [<ffffffff813bb9d1>] ? sysctl_net_init+0x1b/0x1f Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.889580] [<ffffffff8165687b>] ? pci_iommu_init+0x0/0x17 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 13.991580] [<ffffffff81656884>] pci_iommu_init+0x9/0x17 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.093581] [<ffffffff81009056>] _stext+0x56/0x12b Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.195581] [<ffffffff81071220>] ? register_irq_proc+0xa3/0xbf Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.297582] [<ffffffff810e0000>] ? proc_coredump_filter_write+0xe0/0xfe Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.399581] [<ffffffff8164e673>] kernel_init+0x139/0x191 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.501581] [<ffffffff8100d27a>] child_rip+0xa/0x20 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.603581] [<ffffffff8164e53a>] ? kernel_init+0x0/0x191 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.705581] [<ffffffff8100d270>] ? child_rip+0x0/0x20 Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.756580] ---[ end trace 4eaa2a86a8e2da22 ]--- Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.807580] IOMMU: can't map the region Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 14.858580] DMAR:parse DMAR table failure. later in the log file I find lots of these: Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 40.403251] nommu_map_single: overflow 13a08b248+8 of device mask ffffffff and finally Jan 8 17:51:00 dhohndel-mobl4 kernel: [ 66.777166] hub 4-0:1.0: unable to enumerate USB device on port 2 /D -- Dirk Hohndel Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html