On Mon, 16 Apr 2018 23:15:50 -0700 Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > On Mon, Apr 16, 2018 at 11:07:04PM +0200, Jesper Dangaard Brouer wrote: > > On X86 swiotlb fallback (via get_dma_ops -> get_arch_dma_ops) to use > > x86_swiotlb_dma_ops, instead of swiotlb_dma_ops. I also included that > > in below fix patch. > > x86_swiotlb_dma_ops should not exist any mor, and x86 now uses > dma_direct_ops. Looks like you are applying it to an old kernel :) > > > Performance improved to 8.9 Mpps from approx 6.5Mpps. > > > > (This was without my bulking for net_device->ndo_xdp_xmit, so that > > number should improve more). > > What is the number for the otherwise comparable setup without repolines? Approx 12 Mpps. You forgot to handle the dma_direct_mapping_error() case, which still used the retpoline in the above 8.9 Mpps measurement, I fixed it up and performance increased to 9.6 Mpps. Notice, in this test there are still two retpoline/indirect-calls left. The net_device->ndo_xdp_xmit and the invocation of the XDP BPF prog. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer