https://bugzilla.kernel.org/show_bug.cgi?id=76331 Alex Williamson <alex.williamson@xxxxxxxxxx> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dwmw2@xxxxxxxxxxxxx --- Comment #4 from Alex Williamson <alex.williamson@xxxxxxxxxx> --- The DRHD capability registers are reported as: IOMMU 0: c9008010e60262 IOMMU 1: c9078010ef0462 101111 100110 >From the VT-d spec (v2.2), bits 12:8 of the capability register are the Supported Adjusted Guest Address Widths (SAGAW), defined as: This 5-bit field indicates the supported adjusted guest address widths (which in turn represents the levels of page-table walks for the 4KB base page size) supported by the hardware implementation. A value of 1 in any of these bits indicates the corresponding adjusted guest address width is supported. The adjusted guest address widths corresponding to various bit positions within this field are: • 0: Reserved • 1: 39-bit AGAW (3-level page-table) • 2: 48-bit AGAW (4-level page-table) • 3: Reserved • 4: Reserved Software must ensure that the adjusted guest address width used to set up the page tables is one of the supported guest address widths reported in this field. This system therefore has one DRHD unit supporting 3-level page tables (IOMMU 0) and the other supporting 4-level page tables (IOMMU 1). Bits 21:16 are the Maximum Guest Address Width: This field indicates the maximum DMA virtual addressability supported by remapping hardware. The Maximum Guest Address Width (MGAW) is computed as (N+1), where N is the valued reported in this field. For example, a hardware implementation supporting 48-bit MGAW reports a value of 47 (101111b) in this field. If the value in this field is X, untranslated and translated DMA requests to addresses above 2(X+1)-1 are always blocked by hardware. Device-TLB translation requests to address above 2(X+1)-1 from allowed devices return a null Translation-Completion Data with R=W=0. Guest addressability for a given DMA request is limited to the minimum of the value reported through this field and the adjusted guest address width of the corresponding page-table structure. (Adjusted guest address widths supported by hardware are reported through the SAGAW field). Implementations must support MGAW at least equal to the physical addressability (host address width) of the platform. On this system, IOMMU 0 therefore has a MGAW of 0x26 + 1 = 39 bits, IOMMU 1 = 0x2f + 1 = 48 bits. The BUG we're hitting is: BUG_ON(addr_width < BITS_PER_LONG && last_pfn >> addr_width); So the last PFN of the domain is beyond the address width of the domain. last_pfn here is created from DOMAIN_MAX_PFN(domain->gaw) All VM domains are created with a 48 bit width (domain->gaw): #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 So the default last_pfn is 0xf_ffff_ffff Given the default 48 bit width, the default domain AGAW (Adjusted Guest Address Width) is 2 (domain->agaw) When we add devices to the domain, the gaw is updated to match: /* check if this iommu agaw is sufficient for max mapped address */ addr_width = agaw_to_width(iommu->agaw); if (addr_width > cap_mgaw(iommu->cap)) addr_width = cap_mgaw(iommu->cap); if (dmar_domain->max_addr > (1LL << addr_width)) { printk(KERN_ERR "%s: iommu width (%d) is not " "sufficient for the mapped address (%llx)\n", __func__, addr_width, dmar_domain->max_addr); return -EFAULT; } dmar_domain->gaw = addr_width; iommu->agaw is calculated from the SAGAW, and will be either 1 or 2 here depending on which IOMMU manages the device. One bug stands out here, domain->gaw is set to the width of the iommu for the last device added, so an initial suspicion would be that you could avoid the problem by re-ordering the qemu command line to create the devices in the reverse order. So, depending on the order devices were added, domain->gaw is either 48 bits or 39 bits and therefore last_pfn going into the BUG_ON is either 0xf_ffff_ffff or 0x7fff_ffff. addr_width is set from 'agaw_to_width(domain->agaw) - VTD_PAGE_SHIFT' where domain->agaw is initially 2, however just beyond the above code snippet we have: /* * Knock out extra levels of page tables if necessary */ while (iommu->agaw < dmar_domain->agaw) { struct dma_pte *pte; pte = dmar_domain->pgd; if (dma_pte_present(pte)) { dmar_domain->pgd = (struct dma_pte *) phys_to_virt(dma_pte_addr(pte)); free_pgtable_page(pte); } dmar_domain->agaw--; } Therefore, when we add the device behind the 39 bit IOMMU first, we get: last_pfn = 0x7fff_ffff addr_width = 39 but then we add the device behind the 48 bit IOMMU and get: last_pfn = 0xf_ffff_ffff addr_width = 39 Resulting in the BUG_ON The fix might simply be to change setting the GAW here to: dmar_domain->gaw = min(dmar_domain->gaw, addr_width); -- You are receiving this mail because: You are watching the assignee of the bug.-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html