On Mon, 12 Sep 2016 10:14:55 -0700 Benjamin Serebrin <serebrin@xxxxxxxxxx> wrote: > Sure, SWIOTLB is linux-specific but general bounce buffering isn't. > > The idea is that the ACPI bit promises that the guest will not ever > need [SWIOTLB] bounce buffering. That means either no hotplugging at > all, or no hotplugging of high-mem-incapable devices. If our VMM ever > _adds_ a device to its catalog that's capable of hotplug but not > highmem, we'll clear the ACPI bit, for example. I'm happy to discuss > and iterate over what promises are made by the ACPI bit if you'd like. Implications of above is that you effectively push kernel's iommu=off option up the stack where it would have to be configured to disable hotplug (which is for example enabled by default in QEMU). Also every existing/future device has to be modified to provide highmem-cap property so that emulator/firmware could decide if above ACPI table is necessary. It doable if an emulator generates ACPI tables but close to impossible (via standard interfaces) if it's firmware's job. If hotplug is allowed by default and SWIOTLB ACPI table is generated at boot if there aren't any low mem devices at boot, then one'd need fix kernel to try dynamically allocate SWIOTLB and fail high-mem-incapable device hotplug if it's unable to do so. Trying to save 64Mb out of more than 4Gb memory at above cost seems a little bit excessive. Another question: why don't run emulator with emulated IOMMU enabled? Then linux uses real IOMMU dma_ops (intel/amd) and 64Mb for SWIOTLB are not wasted/freed while keeping 32-bit devices operational? Last time I tested it, it works just fine either for coldplug and hotplug cases without need to mess with emulators nor any hardware to provide SWIOTLB ACPI table. > The problem with dynamic allocation of the bounce buffer is that the > SWIOTLB code seems to demand contiguous low memory, and allocating > contiguous memory after boot is never guaranteed because of > fragmentation and subsequent pinning. The original code seems to be > motivated by this: it does an early allocation of a contiguous low mem > and then a late deallocation if it determines that SWIOTLB is not > needed. I imagine they wanted to cover cases where some high > mem-incapable device needed a contiguous target buffer because it had > no (or insufficient) scatter/gather capability. > > One could tie hot plug of a bounce-buffer-requiring virtual device to > causing SWIOTLB allocation, and fail the device initialization if the > required buffer couldn't be allocated. I don't know of any new > virtual devices that require that, though, as high-mem-incapability is > hopefully only a vestige of very old virtual or real devices. And the > plumbing complexity for doing this is much higher than seems > justified. it possibly could be done in centralized manner in kernel when device driver initializes DMA API, for example in dma_set_mask_and_coherent(). Even if it's done it would be regression if kernel's unable to allocate bounce buffer on demand and device init fails were it were working with preallocated SWIOTLB. > > Thanks! > Ben > > On Mon, Sep 12, 2016 at 4:55 AM, Igor Mammedov <imammedo@xxxxxxxxxx> wrote: > > On Sun, 28 Aug 2016 23:36:20 -0700 > > Benjamin Serebrin <serebrin@xxxxxxxxxx> wrote: > > > >> Thanks, all, > >> > >> The general view from last week is to pursue an ACPI table that > >> indicates that the SWIOTLB isn't needed. I'll work with our local > >> ACPI experts on table format. > > Isn't SWIOTLB linux specific impl. detail? > > Suppose guest is started without SWIOTLB and later user hotplugs > > a device that not capable to handle high mem, what's then? > > > > Wouldn't it be better to make SWIOTLB created/allocated > > on demand in kernel (i.e. presence of devices that require it) > > instead of making hardware(hypervisor) to provide some obscure > > ACPI table quirk to fix kernel issue? > > > >> > >> For existing guests, we'll work on language suggesting kernel command > >> line options (iommu=off) if people are concerned, and will look into > >> doing the command line setting in our own provided images. > >> > >> On Thu, Aug 25, 2016 at 7:45 PM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: > >> > 2016-08-26 9:16 GMT+08:00 Yang Zhang <yang.zhang.wz@xxxxxxxxx>: > >> >> On 2016/8/24 22:36, Benjamin Serebrin wrote: > >> >>> > >> >>> iommu=off would kill the SWIOTLB as well, while swiotlb=1 consumes 1MB. > >> >>> > >> >>> However, maintaining guests' kernel commandlines is something we'd > >> >>> like to stay away from if possible. It's certainly a short-term > >> >> > >> >> > >> >> I don't quite understand why stay away from kernel command line. It provides > >> >> more flexibility, allowing you to turn on/off it by yourself. > >> > > >> > I agree with Benjamin, it will result in customers have to tune their > >> > guest OSes kernel command line or we supply guest images w/ kernel > >> > command line modification. > >> > > >> > Regards, > >> > Wanpeng Li > >> > > >> >> > >> >> > >> >>> answer, or something individual customers can choose to do today. > >> -- > >> To unsubscribe from this list: send the line "unsubscribe kvm" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html