On (12/12/14 11:16), Sowmini Varadhan wrote: > > But getting back to linux, 3 Gbps is a far cry from 10 Gbps. > I need to spend some time collecting data to convince myself that > this is purely because of HV/IOMMU inefficiency. > [e1000-devel has been Bcc'ed] I collected the stats, and I have evidence that the HV is not the bottleneck at this point: I am running linux as the Tx side (TCP client) with 10 threads (iperf -c <addr> -P 10) against an iperf server that can handle 9-9.5 Gbps. Baseline: with default settings (TSO enabled) : 9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. (!) With iommu patch to break monolithic lock: 8.5 Gbps. (Note: with no TSO!) I'll share the iommu patch as an RFC in a separate email to sparclinux. But the Rx side may have other bottle-necks: even with the iommu patch, it is stuck at 3 Gbps, though I can get something a bit better merely by disabling GRO (as recommended by intel.com documentation), so 3 Gbps is probably not the ceiling here. I am willing to believe that you can't do much better than approx 8.5 Gbps without additional churn to the DMA design. But 3 Gbps Rx out of a max of 10 Gbps suggests that something other than the HV is holding linux/sparc/Rx back. And it might not even be the DMA overhead, since Tx can pull 8.5 Gbps even with a map/unmap for each packet. I'm still investigating the Rx side, but there are a lot of factors here, with RPS, qdisc, etc all coming into play. Suggestions for things to investigate are welcome. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html