Re: [PATCH net-next v3 3/7] iommu/dma: avoid expensive indirect calls for sync operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-02-14 4:21 pm, Alexander Lobakin wrote:
When IOMMU is on, the actual synchronization happens in the same cases
as with the direct DMA. Advertise %DMA_F_CAN_SKIP_SYNC in IOMMU DMA to
skip sync ops calls (indirect) for non-SWIOTLB buffers.

perf profile before the patch:

     18.53%  [kernel]       [k] gq_rx_skb
     14.77%  [kernel]       [k] napi_reuse_skb
      8.95%  [kernel]       [k] skb_release_data
      5.42%  [kernel]       [k] dev_gro_receive
      5.37%  [kernel]       [k] memcpy
<*>  5.26%  [kernel]       [k] iommu_dma_sync_sg_for_cpu
      4.78%  [kernel]       [k] tcp_gro_receive
<*>  4.42%  [kernel]       [k] iommu_dma_sync_sg_for_device
      4.12%  [kernel]       [k] ipv6_gro_receive
      3.65%  [kernel]       [k] gq_pool_get
      3.25%  [kernel]       [k] skb_gro_receive
      2.07%  [kernel]       [k] napi_gro_frags
      1.98%  [kernel]       [k] tcp6_gro_receive
      1.27%  [kernel]       [k] gq_rx_prep_buffers
      1.18%  [kernel]       [k] gq_rx_napi_handler
      0.99%  [kernel]       [k] csum_partial
      0.74%  [kernel]       [k] csum_ipv6_magic
      0.72%  [kernel]       [k] free_pcp_prepare
      0.60%  [kernel]       [k] __napi_poll
      0.58%  [kernel]       [k] net_rx_action
      0.56%  [kernel]       [k] read_tsc
<*>  0.50%  [kernel]       [k] __x86_indirect_thunk_r11
      0.45%  [kernel]       [k] memset

After patch, lines with <*> no longer show up, and overall
cpu usage looks much better (~60% instead of ~72%):

     25.56%  [kernel]       [k] gq_rx_skb
      9.90%  [kernel]       [k] napi_reuse_skb
      7.39%  [kernel]       [k] dev_gro_receive
      6.78%  [kernel]       [k] memcpy
      6.53%  [kernel]       [k] skb_release_data
      6.39%  [kernel]       [k] tcp_gro_receive
      5.71%  [kernel]       [k] ipv6_gro_receive
      4.35%  [kernel]       [k] napi_gro_frags
      4.34%  [kernel]       [k] skb_gro_receive
      3.50%  [kernel]       [k] gq_pool_get
      3.08%  [kernel]       [k] gq_rx_napi_handler
      2.35%  [kernel]       [k] tcp6_gro_receive
      2.06%  [kernel]       [k] gq_rx_prep_buffers
      1.32%  [kernel]       [k] csum_partial
      0.93%  [kernel]       [k] csum_ipv6_magic
      0.65%  [kernel]       [k] net_rx_action

iavf yields +10% of Mpps on Rx. This also unblocks batched allocations
of XSk buffers when IOMMU is active.

Acked-by: Robin Murphy <robin.murphy@xxxxxxx>

Co-developed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>
---
  drivers/iommu/dma-iommu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 50ccc4f1ef81..4ab9ac13d362 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1707,7 +1707,8 @@ static size_t iommu_dma_opt_mapping_size(void)
  }
static const struct dma_map_ops iommu_dma_ops = {
-	.flags			= DMA_F_PCI_P2PDMA_SUPPORTED,
+	.flags			= DMA_F_PCI_P2PDMA_SUPPORTED |
+				  DMA_F_CAN_SKIP_SYNC,
  	.alloc			= iommu_dma_alloc,
  	.free			= iommu_dma_free,
  	.alloc_pages		= dma_common_alloc_pages,




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux