Vladislav Bolkhovitin wrote:
David Miller wrote:
From: Vladislav Bolkhovitin <vst@xxxxxxxx>
Date: Wed, 13 Aug 2008 22:35:34 +0400
This is because the target sends data in a zero-copy manner, so its
CPU is capable to deal with the load, but on the initiator there are
additional data copies from skb's to page cache and from page cache
to application.
If you've actually been reading at all what I've been saying in this
thread you'll see that I've described a method to do this copy
avoidance in a completely stateless manner.
You don't need to implement a TCP stack in the card in order to do
data placement optimizations. They can be done completely stateless.
Sure, I read what you wrote before writing (although, frankly, didn't
get the idea). But I don't think that overall it would be as efficient
as full hardware offload. See my reply to Jeff Garzik about that.
Also, large portions of the cpu overhead are transactional costs,
which are significantly reduced by existing technologies such as
LRO.
The test used Myricom Myri-10G cards (myri10ge driver), which support
LRO. And from ethtool -S output I conclude it was enabled. Just in case,
I attached it, so you can recheck me.
Also, there wasn't big difference between MTU 1500 and 9000, which is
another point to think that LRO was working.
Thus, apparently, LRO doesn't make a fundamental difference. Maybe this
particular implementation isn't too efficient, I don't know. I don't
have enough information for that.
Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html