On Tuesday 10 November 2015 15:58:19 Sinan Kaya wrote: > > On 11/10/2015 2:56 PM, Arnd Bergmann wrote: > >> The ACPI IORT table declares whether you enable IOMMU for a particular > >> >device or not. The placement of IOMMU HW is system specific. The IORT > >> >table gives the IOMMU HW topology to the operating system. > > This sounds odd. Clearly you need to specify the IOMMU settings for each > > possible PCI device independent of whether the OS actually uses the IOMMU > > or not. > > There are provisions to have DMA mask in the PCIe host bridge not at the > PCIe device level inside IORT table. This setting is specific for each > PCIe bus. It is not per PCIe device. Same thing, I meant the bootloader must provide all the information that is needed to use the IOMMU on all PCI devices. I don't care where the IOMMU driver gets that information. Some IOMMUs require programming a bus/device/function specific number into the I/O page tables, and they might not always have the same algorithm to map from the PCI numbers into their own number space. > It is assumed that the endpoint device driver knows the hardware for > PCIe devices. The driver can also query the supported DMA bits by this > platform via DMA APIs and will request the correct DMA mask from the DMA > subsystem (surprise!). I know how the negotiation works. Note that dma_get_required_mask() will only tell you what mask the device needs to access all of memory, while both the device and bus may have additional limitations, and there is not always a solution. > >In a lot of cases, we want to turn it off to get better performance > > when the driver has set a DMA mask that covers all of RAM, but you > > also want to enable the IOMMU for debugging purposes or for device > > assignment if you run virtual machines. The bootloader doesn't know how > > the device is going to be used, so it cannot define the policy here. > > I think we'll end up adding a virtualization option to the UEFI BIOS > similar to how Intel platforms work. Based on this switch, we'll end up > patching the ACPI table. > > If I remove the IORT entry, then the device is in coherent mode with > device accessing the full RAM range. > > If I have the IORT table, the device is in IOMMU translation mode. > > Details are in the IORT spec. I think that would suck a lot more than being slightly out of spec regarding SBSA if you make the low PCI addresses map to the start of RAM. Asking users to select a 'virtualization' option based on what kind of PCI device and kernel version they have is a major hassle. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html