Re: [PATCH v2] vfio iommu type1: Improve vfio_iommu_type1_pin_pages performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/9/20 6:54 AM, Cornelia Huck wrote:
On Tue, 8 Dec 2020 21:55:53 +0800
"xuxiaoyang (C)" <xuxiaoyang2@xxxxxxxxxx> wrote:

On 2020/11/21 15:58, xuxiaoyang (C) wrote:
vfio_pin_pages() accepts an array of unrelated iova pfns and processes
each to return the physical pfn.  When dealing with large arrays of
contiguous iovas, vfio_iommu_type1_pin_pages is very inefficient because
it is processed page by page.In this case, we can divide the iova pfn
array into multiple continuous ranges and optimize them.  For example,
when the iova pfn array is {1,5,6,7,9}, it will be divided into three
groups {1}, {5,6,7}, {9} for processing.  When processing {5,6,7}, the
number of calls to pin_user_pages_remote is reduced from 3 times to once.
For single page or large array of discontinuous iovas, we still use
vfio_pin_page_external to deal with it to reduce the performance loss
caused by refactoring.

Signed-off-by: Xiaoyang Xu <xuxiaoyang2@xxxxxxxxxx>

(...)


hi Cornelia Huck, Eric Farman, Zhenyu Wang, Zhi Wang

vfio_pin_pages() accepts an array of unrelated iova pfns and processes
each to return the physical pfn.  When dealing with large arrays of
contiguous iovas, vfio_iommu_type1_pin_pages is very inefficient because
it is processed page by page.  In this case, we can divide the iova pfn
array into multiple continuous ranges and optimize them.  I have a set
of performance test data for reference.

The patch was not applied
                     1 page           512 pages
no huge pages:     1638ns           223651ns
THP:               1668ns           222330ns
HugeTLB:           1526ns           208151ns

The patch was applied
                     1 page           512 pages
no huge pages       1735ns           167286ns
THP:               1934ns           126900ns
HugeTLB:           1713ns           102188ns

As Alex Williamson said, this patch lacks proof that it works in the
real world. I think you will have some valuable opinions.

Looking at this from the vfio-ccw angle, I'm not sure how much this
would buy us, as we deal with IDAWs, which are designed so that they
can be non-contiguous. I guess this depends a lot on what the guest
does.

This would be my concern too, but I don't have data off the top of my head to say one way or another...


Eric, any opinion? Do you maybe also happen to have a test setup that
mimics workloads actually seen in the real world?


...I do have some test setups, which I will try to get some data from in a couple days. At the moment I've broken most of those setups trying to implement some other stuff, and can't revert back at the moment. Will get back to this.

Eric



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux