Dear All, On 07.05.2024 13:20, Alexander Lobakin wrote: > Quite often, devices do not need dma_sync operations on x86_64 at least. > Indeed, when dev_is_dma_coherent(dev) is true and > dev_use_swiotlb(dev) is false, iommu_dma_sync_single_for_cpu() > and friends do nothing. > > However, indirectly calling them when CONFIG_RETPOLINE=y consumes about > 10% of cycles on a cpu receiving packets from softirq at ~100Gbit rate. > Even if/when CONFIG_RETPOLINE is not set, there is a cost of about 3%. > > Add dev->need_dma_sync boolean and turn it off during the device > initialization (dma_set_mask()) depending on the setup: > dev_is_dma_coherent() for the direct DMA, !(sync_single_for_device || > sync_single_for_cpu) or the new dma_map_ops flag, %DMA_F_CAN_SKIP_SYNC, > advertised for non-NULL DMA ops. > Then later, if/when swiotlb is used for the first time, the flag > is reset back to on, from swiotlb_tbl_map_single(). > > On iavf, the UDP trafficgen with XDP_DROP in skb mode test shows > +3-5% increase for direct DMA. > > Suggested-by: Christoph Hellwig <hch@xxxxxx> # direct DMA shortcut > Co-developed-by: Eric Dumazet <edumazet@xxxxxxxxxx> > Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx> > Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> > --- > include/linux/device.h | 4 +++ > include/linux/dma-map-ops.h | 12 ++++++++ > include/linux/dma-mapping.h | 53 +++++++++++++++++++++++++++++++---- > kernel/dma/mapping.c | 55 +++++++++++++++++++++++++++++-------- > kernel/dma/swiotlb.c | 6 ++++ > 5 files changed, 113 insertions(+), 17 deletions(-) This patch landed in today's linux-next as commit f406c8e4b770 ("dma: avoid redundant calls for sync operations"). Unfortunately I found that it breaks some of the ARM 32bit boards by forcing skipping DMA sync operations on non-coherent systems. This happens because this patch hooks dma_need_sync=true initialization into set_dma_mask(), but set_dma_mask() is not called from all device drivers, especially from those which operates properly with the default 32bit dma mask (like most of the platform devices created by the OF layer). Frankly speaking I have no idea how this should be fixed. I expect that there are lots of broken devices after this change, because I don't remember that calling set_dma_mask() is mandatory for device drivers. After adding dma_set_mask(dev, DMA_BIT_MASK(32)) to the drivers relevant for my boards the issues are gone, but I'm not sure this is the right approach... > ... Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland