On Sat, 14 Apr 2018 21:29:26 +0200 David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > On Fri, 2018-04-13 at 19:26 +0200, Christoph Hellwig wrote: > > On Fri, Apr 13, 2018 at 10:12:41AM -0700, Tushar Dave wrote: > > > I guess there is nothing we need to do! > > > > > > On x86, in case of no intel iommu or iommu is disabled, you end up in > > > swiotlb for DMA API calls when system has 4G memory. > > > However, AFAICT, for 64bit DMA capable devices swiotlb DMA APIs do not > > > use bounce buffer until and unless you have swiotlb=force specified in > > > kernel commandline. > > > > Sure. But that means very sync_*_to_device and sync_*_to_cpu now > > involves an indirect call to do exactly nothing, which in the workload > > Jesper is looking at is causing a huge performance degradation due to > > retpolines. Yes, exactly. > > We should look at using the > > if (dma_ops == swiotlb_dma_ops) > swiotlb_map_page() > else > dma_ops->map_page() > > trick for this. Perhaps with alternatives so that when an Intel or AMD > IOMMU is detected, it's *that* which is checked for as the special > case. Yes, this trick is basically what I'm asking for :-) It did sound like Hellwig wanted to first avoid/fix that x86 end-up defaulting to swiotlb. Thus, we just have to do the same trick with the new default fall-through dma_ops. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer