On Tue, Sep 13, 2016 at 2:47 AM, Igor Mammedov <imammedo@xxxxxxxxxx> wrote: > On Mon, 12 Sep 2016 10:14:55 -0700 > Benjamin Serebrin <serebrin@xxxxxxxxxx> wrote: > >> Sure, SWIOTLB is linux-specific but general bounce buffering isn't. >> >> The idea is that the ACPI bit promises that the guest will not ever >> need [SWIOTLB] bounce buffering. That means either no hotplugging at >> all, or no hotplugging of high-mem-incapable devices. If our VMM ever >> _adds_ a device to its catalog that's capable of hotplug but not >> highmem, we'll clear the ACPI bit, for example. I'm happy to discuss >> and iterate over what promises are made by the ACPI bit if you'd like. > Implications of above is that you effectively push kernel's iommu=off > option up the stack where it would have to be configured to disable > hotplug (which is for example enabled by default in QEMU). > Also every existing/future device has to be modified to provide > highmem-cap property so that emulator/firmware could decide if > above ACPI table is necessary. It doable if an emulator generates > ACPI tables but close to impossible (via standard interfaces) > if it's firmware's job. > > If hotplug is allowed by default and SWIOTLB ACPI table is generated > at boot if there aren't any low mem devices at boot, > then one'd need fix kernel to try dynamically allocate SWIOTLB and > fail high-mem-incapable device hotplug if it's unable to do so. > > Trying to save 64Mb out of more than 4Gb memory at above cost seems > a little bit excessive. I don't recommend such complexity; I was proposing a hint bit in ACPI as a simple promise from the hypervisor. 64MB is 1.5% of a 4GB machine. We wanted an easy way to give it back to the guest. > > Another question: > why don't run emulator with emulated IOMMU enabled? Then linux uses > real IOMMU dma_ops (intel/amd) and 64Mb for SWIOTLB are not wasted/freed > while keeping 32-bit devices operational? > Last time I tested it, it works just fine either for coldplug and > hotplug cases without need to mess with emulators nor any hardware > to provide SWIOTLB ACPI table. > IOMMU comes with its own overheads; for example, until kernel v4.7, where the speedup in the intel IOMMU ops was merged, guest intel-iommu.c code has significant performance scalability issues. I would be more willing to try to get distros to backport a simple no-SWIOTLB change than the fairly-invasive IOMMU optimizations. We'll be living with many pre-4.7 guests for quite a while. > >> The problem with dynamic allocation of the bounce buffer is that the >> SWIOTLB code seems to demand contiguous low memory, and allocating >> contiguous memory after boot is never guaranteed because of >> fragmentation and subsequent pinning. The original code seems to be >> motivated by this: it does an early allocation of a contiguous low mem >> and then a late deallocation if it determines that SWIOTLB is not >> needed. I imagine they wanted to cover cases where some high >> mem-incapable device needed a contiguous target buffer because it had >> no (or insufficient) scatter/gather capability. >> >> One could tie hot plug of a bounce-buffer-requiring virtual device to >> causing SWIOTLB allocation, and fail the device initialization if the >> required buffer couldn't be allocated. I don't know of any new >> virtual devices that require that, though, as high-mem-incapability is >> hopefully only a vestige of very old virtual or real devices. And the >> plumbing complexity for doing this is much higher than seems >> justified. > it possibly could be done in centralized manner in kernel when > device driver initializes DMA API, for example in > dma_set_mask_and_coherent(). > Even if it's done it would be regression if kernel's unable to > allocate bounce buffer on demand and device init fails were it were > working with preallocated SWIOTLB. > > >> >> Thanks! >> Ben >> >> On Mon, Sep 12, 2016 at 4:55 AM, Igor Mammedov <imammedo@xxxxxxxxxx> wrote: >> > On Sun, 28 Aug 2016 23:36:20 -0700 >> > Benjamin Serebrin <serebrin@xxxxxxxxxx> wrote: >> > >> >> Thanks, all, >> >> >> >> The general view from last week is to pursue an ACPI table that >> >> indicates that the SWIOTLB isn't needed. I'll work with our local >> >> ACPI experts on table format. >> > Isn't SWIOTLB linux specific impl. detail? >> > Suppose guest is started without SWIOTLB and later user hotplugs >> > a device that not capable to handle high mem, what's then? >> > >> > Wouldn't it be better to make SWIOTLB created/allocated >> > on demand in kernel (i.e. presence of devices that require it) >> > instead of making hardware(hypervisor) to provide some obscure >> > ACPI table quirk to fix kernel issue? >> > >> >> >> >> For existing guests, we'll work on language suggesting kernel command >> >> line options (iommu=off) if people are concerned, and will look into >> >> doing the command line setting in our own provided images. >> >> >> >> On Thu, Aug 25, 2016 at 7:45 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: >> >> > 2016-08-26 9:16 GMT+08:00 Yang Zhang <yang.zhang.wz@xxxxxxxxx>: >> >> >> On 2016/8/24 22:36, Benjamin Serebrin wrote: >> >> >>> >> >> >>> iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB. >> >> >>> >> >> >>> However, maintaining guests' kernel commandlines is something we'd >> >> >>> like to stay away from if possible. It's certainly a short-term >> >> >> >> >> >> >> >> >> I don't quite understand why stay away from kernel command line. It provides >> >> >> more flexibility, allowing you to turn on/off it by yourself. >> >> > >> >> > I agree with Benjamin, it will result in customers have to tune their >> >> > guest OSes kernel command line or we supply guest images w/ kernel >> >> > command line modification. >> >> > >> >> > Regards, >> >> > Wanpeng Li >> >> > >> >> >> >> >> >> >> >> >>> answer, or something individual customers can choose to do today. >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html