On 02/12/2019 16:36, Alistair Popple wrote: > On Monday, 2 December 2019 12:59:49 PM AEDT Alexey Kardashevskiy wrote: >> Here is an attempt to support bigger DMA space for devices >> supporting DMA masks less than 59 bits (GPUs come into mind >> first). POWER9 PHBs have an option to map 2 windows at 0 >> and select a windows based on DMA address being below or above >> 4GB. >> >> This adds the "iommu=iommu_bypass" kernel parameter and > > Would it be possible to just enable this by default if the platform supports > it? Are there any downsides? It changes the second DMA window location which is now assumed by QEMU to be at 0x800.0000.0000.0000 and I do not see an easy way to work around this. For example, we start QEMU without VFIO but with emulated XHCI which will ask for DDW, we (QEMU) have to pick a window location but then we have to stick to it and if a user later hotplugs an VFIO-PCI, that physical IOMMU has to support the previously selected DMA window address; otherwise hotplug is going to fail. The question is how to tell QEMU about this new offset and what we do about migration from P8 (which let's say did have a VFIO device which we unplug before the migration) to P9 with a prospect of hotplugging an VFIO device but this time with this GTE4GB bit set. > Adding it as an option seems like it would make > things harder to support and reduces the amount of testing/use it would get. Yeah, this why this is an RFC... >> supports VFIO+pseries machine - current this requires telling >> upstream+unmodified QEMU about this via >> -global spapr-pci-host-bridge.dma64_win_addr=0x100000000 >> or per-phb property. 4/4 advertises the new option but >> there is no automation around it in QEMU (should it be?). >> >> For now it is either 1<<59 or 4GB mode; dynamic switching is >> not supported (could be via sysfs). >> >> This is based on sha1 >> a6ed68d6468b Linus Torvalds "Merge tag 'drm-next-2019-11-27' of git:// > anongit.freedesktop.org/drm/drm". > > Are you sure? Almost. It should have been HEAD^^^^^..HEAD instead of HEAD^^^^..HEAD :) I've posted 00/4 to the thread now, sorry about that. Thanks, > I am getting the following rejected hunk trying to apply the > first patch in the series: > > --- arch/powerpc/platforms/powernv/pci-ioda.c > +++ arch/powerpc/platforms/powernv/pci-ioda.c > @@ -2349,15 +2349,10 @@ static void pnv_pci_ioda2_set_bypass(struct > pnv_ioda_pe *pe, bool enable) > pe->tce_bypass_enabled = enable; > } > > -static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group, > - int num, __u32 page_shift, __u64 window_size, __u32 levels, > +static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset, > + __u32 page_shift, __u64 window_size, __u32 levels, > bool alloc_userspace_copy, struct iommu_table **ptbl) > { > - struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe, > - table_group); > - int nid = pe->phb->hose->node; > - __u64 bus_offset = num ? > - pe->table_group.tce64_start : table_group->tce32_start; > long ret; > struct iommu_table *tbl; > > - Alistair > >> Please comment. Thanks. >> >> >> >> Alexey Kardashevskiy (4): >> powerpc/powernv/ioda: Rework for huge DMA window at 4GB >> powerpc/powernv/ioda: Allow smaller TCE table levels >> powerpc/powernv/phb4: Add 4GB IOMMU bypass mode >> vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB >> >> arch/powerpc/include/asm/iommu.h | 1 + >> arch/powerpc/include/asm/opal-api.h | 11 +- >> arch/powerpc/include/asm/opal.h | 2 + >> arch/powerpc/platforms/powernv/pci.h | 1 + >> include/uapi/linux/vfio.h | 2 + >> arch/powerpc/platforms/powernv/opal-call.c | 2 + >> arch/powerpc/platforms/powernv/pci-ioda-tce.c | 4 +- >> arch/powerpc/platforms/powernv/pci-ioda.c | 219 ++++++++++++++---- >> drivers/vfio/vfio_iommu_spapr_tce.c | 10 +- >> 9 files changed, 202 insertions(+), 50 deletions(-) >> >> > > > > -- Alexey