On 10/05/2012 09:55 AM, Andi Kleen wrote: > Alexander Duyck <alexander.h.duyck@xxxxxxxxx> writes: > >> While working on 10Gb/s routing performance I found a significant amount of >> time was being spent in the swiotlb DMA handler. Further digging found that a >> significant amount of this was due to the fact that virtual to physical >> address translation and calling the function that did it. It accounted for >> nearly 60% of the total overhead. > Can you find out why that is? Traditionally virt_to_phys was just a > subtraction. Then later on it was a if and a subtraction. > > It cannot really be that expensive. Do you have some debugging enabled? > > Really virt_to_phys should be fixed. Such fundamental operations > shouldn't slow. I don't think hacking up all the users to work > around this is the r ight way. > > Looking at the code a bit someone (crazy) made it out of line. > But that cannot explain that much overhead. > > > -Andi > I was thinking the issue was all of the calls to relatively small functions occurring in quick succession. The way most of this code is setup it seems like it is one small function call in turn calling another, and then another, and I would imagine the code fragmentation can have a significant negative impact. For example just the first patch in the series is enough to see a significant performance gain and that is simply due to the fact that is_swiotlb_buffer becomes inlined when I built it on my system. The basic idea I had with these patches was to avoid making multiple calls in quick succession and instead just to have all the data right there so that all of the swiotlb functions don't need to make many external calls, at least not until they are actually dealing with bounce buffers which are slower due to locking anyway. Thanks, Alex _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/devel