On Sat, Jun 21, 2008 at 12:02:31PM -0500, Steve French (smfrench@xxxxxxxxx) wrote: > Your point about tcp_sendpage reminds me of something I have been wondering > about for a while. Since SunRPC switched to kernel_sendpage I have > been wondering > whether that is better or worse than kernel_sendmsg. It looks like > in some cases > sendpage simply falls back to calling sendmsg with 1 iovec (instead of calling > tcp_sendpage which calls do_tcp_sendpages) which would end up being > slower than calling > sendmsg with a larger iovec as we do in smb_send2 in the write path. This happens for hardware, which does not support hardware checksumming and scater-gather. sendpage() fundamentally requires it, since sending is lockless and checksumming happens on the very end of the transmit path: around the time when hardware does dma into the wire. Software checksumming may end up with broken checksum if data was changed in-flight. Also note, that kernel_sendpage() return does not mean that data was really sent, so its modification can lead to corrupted protocol. It is also forbidden to sendpage slab pages. > For the write case in which we are writing pages (that are aligned) > out of the page cache to > the socket would sendpage be any faster than sendmsg? (I wish there > was a sendmultiple > pages call where I could pass the whole list of pages). How should a > piece of kernel > code check to see if sendpage is supported/faster and when to use > kernel_sendpage and > when to use sendmsg with the pages in the iovec You can simply check sk->sk_route_caps, it has to have NETIF_F_SG and NETIF_F_ALL_CSUM bits set to support sendpage(). sendpage() is generally faster, since it does not perform data copy (and checksumming, but depending on how it is called, like from userspace, it may not be the main factor). With jumbo frames it provides more noticeble win, but for smaller mtu it is frequently not that big (well, in POHMELFS I did not get any better numbers from switching to sendpage() instead of sendmsg() neither in cpu utilization nor in performance, but I ran 1500 mtu tests on either quite fast machines and gige link or over very slow (3 mbyte/s) link). sendpage() also perfroms less allocations and they are smaller than that for sendmsg(). I'm actually surprised that in bulk transfer per-page sending is slower than lots of pages in one go. Of course it should be faster, but difference should be very small, since it is only a matter of socket lock grab, which for bulk data sending should not be an issue at all. At least in my tests I never had a difference and easily achieved wire limit even for per-page sending with data copy (i.e. sendmsg()). -- Evgeniy Polyakov -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html