On 2020/12/9 19:54, Cornelia Huck wrote: > On Tue, 8 Dec 2020 21:55:53 +0800 > "xuxiaoyang (C)" <xuxiaoyang2@xxxxxxxxxx> wrote: > >> On 2020/11/21 15:58, xuxiaoyang (C) wrote: >>> vfio_pin_pages() accepts an array of unrelated iova pfns and processes >>> each to return the physical pfn. When dealing with large arrays of >>> contiguous iovas, vfio_iommu_type1_pin_pages is very inefficient because >>> it is processed page by page.In this case, we can divide the iova pfn >>> array into multiple continuous ranges and optimize them. For example, >>> when the iova pfn array is {1,5,6,7,9}, it will be divided into three >>> groups {1}, {5,6,7}, {9} for processing. When processing {5,6,7}, the >>> number of calls to pin_user_pages_remote is reduced from 3 times to once. >>> For single page or large array of discontinuous iovas, we still use >>> vfio_pin_page_external to deal with it to reduce the performance loss >>> caused by refactoring. >>> >>> Signed-off-by: Xiaoyang Xu <xuxiaoyang2@xxxxxxxxxx> > > (...) > >> >> hi Cornelia Huck, Eric Farman, Zhenyu Wang, Zhi Wang >> >> vfio_pin_pages() accepts an array of unrelated iova pfns and processes >> each to return the physical pfn. When dealing with large arrays of >> contiguous iovas, vfio_iommu_type1_pin_pages is very inefficient because >> it is processed page by page. In this case, we can divide the iova pfn >> array into multiple continuous ranges and optimize them. I have a set >> of performance test data for reference. >> >> The patch was not applied >> 1 page 512 pages >> no huge pages: 1638ns 223651ns >> THP: 1668ns 222330ns >> HugeTLB: 1526ns 208151ns >> >> The patch was applied >> 1 page 512 pages >> no huge pages 1735ns 167286ns >> THP: 1934ns 126900ns >> HugeTLB: 1713ns 102188ns >> >> As Alex Williamson said, this patch lacks proof that it works in the >> real world. I think you will have some valuable opinions. > > Looking at this from the vfio-ccw angle, I'm not sure how much this > would buy us, as we deal with IDAWs, which are designed so that they > can be non-contiguous. I guess this depends a lot on what the guest > does. > > Eric, any opinion? Do you maybe also happen to have a test setup that > mimics workloads actually seen in the real world? > > . > Thank you for your reply. The iova array constructed using pfn_array_alloc is continuous, and I think there will be some performance improvements here. Regards, Xu