From: Sowmini Varadhan <sowmini.varadhan@xxxxxxxxxx> Date: Fri, 19 Dec 2014 10:16:16 -0500 > In iperf experiments running linux as the Tx side (TCP client) with > 10 threads results in a severe performance drop when TSO is disabled, > indicating a weakness in the software that turns out to be avoidable > after this patch. > > Baseline numbers before this patch: > with default settings (TSO enabled) : 9-9.5 Gbps > Disable TSO using ethtool- drops badly: 2-3 Gbps. (!) > > What this patch does: > Output from lockstat flags the iommu->lock as the hottest > lock, showing something of the order of 21M contentions out of > 27M acquisitions, and an average wait time of 26 us for the lock. > This is not efficient. A better design is to follow the ppc model, > where the iommu_table has multiple pools, each stretching over a > segment of the map, and with a separate lock for each pool. This > model allows for better parallelization of the iommu map search. > > After this patch, iperf client with 10 threads, can give a > throughput of at least 8.5 Gbps, even when TSO is disabled. > > > Signed-off-by: Sowmini Varadhan <sowmini.varadhan@xxxxxxxxxx> If this is such a better and more scalable algorithm for IOMMU arena DMA region allocation, then instead of one platform after another putting a private implementation under arch/, the generic IOMMU code should be adjusted instead. Right? -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html