On 09/06/2013 04:01 PM, Gleb Natapov wrote: > On Fri, Sep 06, 2013 at 09:38:21AM +1000, Alexey Kardashevskiy wrote: >> On 09/06/2013 04:10 AM, Gleb Natapov wrote: >>> On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote: >>>> On 09/03/2013 08:53 PM, Gleb Natapov wrote: >>>>> On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote: >>>>>> On 09/01/2013 10:06 PM, Gleb Natapov wrote: >>>>>>> On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote: >>>>>>>> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT >>>>>>>> and H_STUFF_TCE requests targeted an IOMMU TCE table without passing >>>>>>>> them to user space which saves time on switching to user space and back. >>>>>>>> >>>>>>>> Both real and virtual modes are supported. The kernel tries to >>>>>>>> handle a TCE request in the real mode, if fails it passes the request >>>>>>>> to the virtual mode to complete the operation. If it a virtual mode >>>>>>>> handler fails, the request is passed to user space. >>>>>>>> >>>>>>>> The first user of this is VFIO on POWER. Trampolines to the VFIO external >>>>>>>> user API functions are required for this patch. >>>>>>>> >>>>>>>> This adds a "SPAPR TCE IOMMU" KVM device to associate a logical bus >>>>>>>> number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling >>>>>>>> of map/unmap requests. The device supports a single attribute which is >>>>>>>> a struct with LIOBN and IOMMU fd. When the attribute is set, the device >>>>>>>> establishes the connection between KVM and VFIO. >>>>>>>> >>>>>>>> Tests show that this patch increases transmission speed from 220MB/s >>>>>>>> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). >>>>>>>> >>>>>>>> Signed-off-by: Paul Mackerras <paulus@xxxxxxxxx> >>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> Changes: >>>>>>>> v9: >>>>>>>> * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with "SPAPR TCE IOMMU" >>>>>>>> KVM device >>>>>>>> * release_spapr_tce_table() is not shared between different TCE types >>>>>>>> * reduced the patch size by moving VFIO external API >>>>>>>> trampolines to separate patche >>>>>>>> * moved documentation from Documentation/virtual/kvm/api.txt to >>>>>>>> Documentation/virtual/kvm/devices/spapr_tce_iommu.txt >>>>>>>> >>>>>>>> v8: >>>>>>>> * fixed warnings from check_patch.pl >>>>>>>> >>>>>>>> 2013/07/11: >>>>>>>> * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled >>>>>>>> for KVM_BOOK3S_64 >>>>>>>> * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense >>>>>>>> for this here but the next patch for hugepages support will use it more. >>>>>>>> >>>>>>>> 2013/07/06: >>>>>>>> * added realmode arch_spin_lock to protect TCE table from races >>>>>>>> in real and virtual modes >>>>>>>> * POWERPC IOMMU API is changed to support real mode >>>>>>>> * iommu_take_ownership and iommu_release_ownership are protected by >>>>>>>> iommu_table's locks >>>>>>>> * VFIO external user API use rewritten >>>>>>>> * multiple small fixes >>>>>>>> >>>>>>>> 2013/06/27: >>>>>>>> * tce_list page is referenced now in order to protect it from accident >>>>>>>> invalidation during H_PUT_TCE_INDIRECT execution >>>>>>>> * added use of the external user VFIO API >>>>>>>> >>>>>>>> 2013/06/05: >>>>>>>> * changed capability number >>>>>>>> * changed ioctl number >>>>>>>> * update the doc article number >>>>>>>> >>>>>>>> 2013/05/20: >>>>>>>> * removed get_user() from real mode handlers >>>>>>>> * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there >>>>>>>> translated TCEs, tries realmode_get_page() on those and if it fails, it >>>>>>>> passes control over the virtual mode handler which tries to finish >>>>>>>> the request handling >>>>>>>> * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit >>>>>>>> on a page >>>>>>>> * The only reason to pass the request to user mode now is when the user mode >>>>>>>> did not register TCE table in the kernel, in all other cases the virtual mode >>>>>>>> handler is expected to do the job >>>>>>>> --- >>>>>>>> .../virtual/kvm/devices/spapr_tce_iommu.txt | 37 +++ >>>>>>>> arch/powerpc/include/asm/kvm_host.h | 4 + >>>>>>>> arch/powerpc/kvm/book3s_64_vio.c | 310 ++++++++++++++++++++- >>>>>>>> arch/powerpc/kvm/book3s_64_vio_hv.c | 122 ++++++++ >>>>>>>> arch/powerpc/kvm/powerpc.c | 1 + >>>>>>>> include/linux/kvm_host.h | 1 + >>>>>>>> virt/kvm/kvm_main.c | 5 + >>>>>>>> 7 files changed, 477 insertions(+), 3 deletions(-) >>>>>>>> create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt >>>>>>>> >>>>>>>> diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt >>>>>>>> new file mode 100644 >>>>>>>> index 0000000..4bc8fc3 >>>>>>>> --- /dev/null >>>>>>>> +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt >>>>>>>> @@ -0,0 +1,37 @@ >>>>>>>> +SPAPR TCE IOMMU device >>>>>>>> + >>>>>>>> +Capability: KVM_CAP_SPAPR_TCE_IOMMU >>>>>>>> +Architectures: powerpc >>>>>>>> + >>>>>>>> +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU >>>>>>>> + >>>>>>>> +Groups: >>>>>>>> + KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE >>>>>>>> + Attributes: single attribute with pair { LIOBN, IOMMU fd} >>>>>>>> + >>>>>>>> +This is completely made up device which provides API to link >>>>>>>> +logical bus number (LIOBN) and IOMMU group. The user space has >>>>>>>> +to create a new SPAPR TCE IOMMU device per a logical bus. >>>>>>>> + >>>>>>> Why not have one device that can handle multimple links? >>>>>> >>>>>> >>>>>> I can do that. If I make it so, it won't even look as a device at all, just >>>>>> some weird interface to KVM but ok. What bothers me is it is just a >>>>> May be I do not understand usage pattern here. Why do you feel that device >>>>> that can handle multiple links is worse than device per link? How many logical >>>>> buses is there usually? How often they created/destroyed? I am not insisting >>>>> on the change, just trying to understand why you do not like it. >>>> >>>> >>>> Is it usually one PCI host bus adapter per IOMMU group which is usually >>>> one PCI card or 2-3 cards if it is a legacy PCI-X, and they are created >>>> when QEMU-KVM starts. Not many. And they live till KVM ends. >>>> >>>> My point is why would I want to put all links to one device? It all is just >>>> a matter of taste and nothing more. Or I am missing something but I do not >>>> see what. If it is all about making thing to be kosher/halal/orthodox, then >>>> I have more stuff to do, like reworking the emulated TCEs. But if is it for >>>> (I do not know, just guessing) performance or something like that - then >>>> I'll fix it, I just need to know what I am fixing. >>>> >>> Each device creates an fd, if you can have a lot of them eventually this >>> will be a bottleneck. You are saying this is not the case, so lets go >>> with proposed interface. >> >> >> Did you decide not to answer the email which Ben sent yesterday or you just >> did not see it? Just checking :) >> > Haven't seen it. Which one? Subject: Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling Date: Thu, 05 Sep 2013 14:05:09 +1000 From: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> To: Gleb Natapov <gleb@xxxxxxxxxx> CC: Alexey Kardashevskiy <aik@xxxxxxxxx>, linuxppc-dev@xxxxxxxxxxxxxxxx, David Gibson <david@xxxxxxxxxxxxxxxxxxxxx>, Paul Mackerras <paulus@xxxxxxxxx>, Paolo Bonzini <pbonzini@xxxxxxxxxx>, Alexander Graf <agraf@xxxxxxx>, kvm@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, kvm-ppc@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx -- Alexey -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>