On Fri, Sep 06, 2013 at 09:38:21AM +1000, Alexey Kardashevskiy wrote: > On 09/06/2013 04:10 AM, Gleb Natapov wrote: > > On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote: > >> On 09/03/2013 08:53 PM, Gleb Natapov wrote: > >>> On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote: > >>>> On 09/01/2013 10:06 PM, Gleb Natapov wrote: > >>>>> On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote: > >>>>>> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT > >>>>>> and H_STUFF_TCE requests targeted an IOMMU TCE table without passing > >>>>>> them to user space which saves time on switching to user space and back. > >>>>>> > >>>>>> Both real and virtual modes are supported. The kernel tries to > >>>>>> handle a TCE request in the real mode, if fails it passes the request > >>>>>> to the virtual mode to complete the operation. If it a virtual mode > >>>>>> handler fails, the request is passed to user space. > >>>>>> > >>>>>> The first user of this is VFIO on POWER. Trampolines to the VFIO external > >>>>>> user API functions are required for this patch. > >>>>>> > >>>>>> This adds a "SPAPR TCE IOMMU" KVM device to associate a logical bus > >>>>>> number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling > >>>>>> of map/unmap requests. The device supports a single attribute which is > >>>>>> a struct with LIOBN and IOMMU fd. When the attribute is set, the device > >>>>>> establishes the connection between KVM and VFIO. > >>>>>> > >>>>>> Tests show that this patch increases transmission speed from 220MB/s > >>>>>> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). > >>>>>> > >>>>>> Signed-off-by: Paul Mackerras <paulus@xxxxxxxxx> > >>>>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> > >>>>>> > >>>>>> --- > >>>>>> > >>>>>> Changes: > >>>>>> v9: > >>>>>> * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with "SPAPR TCE IOMMU" > >>>>>> KVM device > >>>>>> * release_spapr_tce_table() is not shared between different TCE types > >>>>>> * reduced the patch size by moving VFIO external API > >>>>>> trampolines to separate patche > >>>>>> * moved documentation from Documentation/virtual/kvm/api.txt to > >>>>>> Documentation/virtual/kvm/devices/spapr_tce_iommu.txt > >>>>>> > >>>>>> v8: > >>>>>> * fixed warnings from check_patch.pl > >>>>>> > >>>>>> 2013/07/11: > >>>>>> * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled > >>>>>> for KVM_BOOK3S_64 > >>>>>> * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense > >>>>>> for this here but the next patch for hugepages support will use it more. > >>>>>> > >>>>>> 2013/07/06: > >>>>>> * added realmode arch_spin_lock to protect TCE table from races > >>>>>> in real and virtual modes > >>>>>> * POWERPC IOMMU API is changed to support real mode > >>>>>> * iommu_take_ownership and iommu_release_ownership are protected by > >>>>>> iommu_table's locks > >>>>>> * VFIO external user API use rewritten > >>>>>> * multiple small fixes > >>>>>> > >>>>>> 2013/06/27: > >>>>>> * tce_list page is referenced now in order to protect it from accident > >>>>>> invalidation during H_PUT_TCE_INDIRECT execution > >>>>>> * added use of the external user VFIO API > >>>>>> > >>>>>> 2013/06/05: > >>>>>> * changed capability number > >>>>>> * changed ioctl number > >>>>>> * update the doc article number > >>>>>> > >>>>>> 2013/05/20: > >>>>>> * removed get_user() from real mode handlers > >>>>>> * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there > >>>>>> translated TCEs, tries realmode_get_page() on those and if it fails, it > >>>>>> passes control over the virtual mode handler which tries to finish > >>>>>> the request handling > >>>>>> * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit > >>>>>> on a page > >>>>>> * The only reason to pass the request to user mode now is when the user mode > >>>>>> did not register TCE table in the kernel, in all other cases the virtual mode > >>>>>> handler is expected to do the job > >>>>>> --- > >>>>>> .../virtual/kvm/devices/spapr_tce_iommu.txt | 37 +++ > >>>>>> arch/powerpc/include/asm/kvm_host.h | 4 + > >>>>>> arch/powerpc/kvm/book3s_64_vio.c | 310 ++++++++++++++++++++- > >>>>>> arch/powerpc/kvm/book3s_64_vio_hv.c | 122 ++++++++ > >>>>>> arch/powerpc/kvm/powerpc.c | 1 + > >>>>>> include/linux/kvm_host.h | 1 + > >>>>>> virt/kvm/kvm_main.c | 5 + > >>>>>> 7 files changed, 477 insertions(+), 3 deletions(-) > >>>>>> create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt > >>>>>> > >>>>>> diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt > >>>>>> new file mode 100644 > >>>>>> index 0000000..4bc8fc3 > >>>>>> --- /dev/null > >>>>>> +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt > >>>>>> @@ -0,0 +1,37 @@ > >>>>>> +SPAPR TCE IOMMU device > >>>>>> + > >>>>>> +Capability: KVM_CAP_SPAPR_TCE_IOMMU > >>>>>> +Architectures: powerpc > >>>>>> + > >>>>>> +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU > >>>>>> + > >>>>>> +Groups: > >>>>>> + KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE > >>>>>> + Attributes: single attribute with pair { LIOBN, IOMMU fd} > >>>>>> + > >>>>>> +This is completely made up device which provides API to link > >>>>>> +logical bus number (LIOBN) and IOMMU group. The user space has > >>>>>> +to create a new SPAPR TCE IOMMU device per a logical bus. > >>>>>> + > >>>>> Why not have one device that can handle multimple links? > >>>> > >>>> > >>>> I can do that. If I make it so, it won't even look as a device at all, just > >>>> some weird interface to KVM but ok. What bothers me is it is just a > >>> May be I do not understand usage pattern here. Why do you feel that device > >>> that can handle multiple links is worse than device per link? How many logical > >>> buses is there usually? How often they created/destroyed? I am not insisting > >>> on the change, just trying to understand why you do not like it. > >> > >> > >> Is it usually one PCI host bus adapter per IOMMU group which is usually > >> one PCI card or 2-3 cards if it is a legacy PCI-X, and they are created > >> when QEMU-KVM starts. Not many. And they live till KVM ends. > >> > >> My point is why would I want to put all links to one device? It all is just > >> a matter of taste and nothing more. Or I am missing something but I do not > >> see what. If it is all about making thing to be kosher/halal/orthodox, then > >> I have more stuff to do, like reworking the emulated TCEs. But if is it for > >> (I do not know, just guessing) performance or something like that - then > >> I'll fix it, I just need to know what I am fixing. > >> > > Each device creates an fd, if you can have a lot of them eventually this > > will be a bottleneck. You are saying this is not the case, so lets go > > with proposed interface. > > > Did you decide not to answer the email which Ben sent yesterday or you just > did not see it? Just checking :) > Haven't seen it. Which one? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html