Re: flush and EIO errors when writepages fails

Evgeniy Polyakov <johnpol@xxxxxxxxxxx> · Sat, 21 Jun 2008 21:26:39 +0400

On Sat, Jun 21, 2008 at 12:02:31PM -0500, Steve French (smfrench@xxxxxxxxx) wrote:
> Your point about tcp_sendpage reminds me of something I have been wondering
> about for a while.  Since SunRPC switched to kernel_sendpage I have
> been wondering
> whether that is better or worse than kernel_sendmsg.   It looks like
> in some cases
> sendpage simply falls back to calling sendmsg with 1 iovec (instead of calling
> tcp_sendpage which calls do_tcp_sendpages) which would end up being
> slower than calling
> sendmsg with a larger iovec as we do in smb_send2 in the write path.

This happens for hardware, which does not support hardware checksumming
and scater-gather. sendpage() fundamentally requires it, since sending
is lockless and checksumming happens on the very end of the transmit
path: around the time when hardware does dma into the wire. Software
checksumming may end up with broken checksum if data was changed
in-flight. Also note, that kernel_sendpage() return does not mean that
data was really sent, so its modification can lead to corrupted
protocol. It is also forbidden to sendpage slab pages.

> For the write case in which we are writing pages (that are aligned)
> out of the page cache to
> the socket would sendpage be any faster than sendmsg?  (I wish there
> was a sendmultiple
> pages call where I could pass the whole list of pages).  How should a
> piece of kernel
> code check to see if sendpage is supported/faster and when to use
> kernel_sendpage and
> when to use sendmsg with the pages in the iovec

You can simply check sk->sk_route_caps, it has to have NETIF_F_SG and
NETIF_F_ALL_CSUM bits set to support sendpage().
sendpage() is generally faster, since it does not perform data copy
(and checksumming, but depending on how it is called, like from
userspace, it may not be the main factor). With jumbo frames it provides
more noticeble win, but for smaller mtu it is frequently not that
big (well, in POHMELFS I did not get any better numbers from switching
to sendpage() instead of sendmsg() neither in cpu utilization nor in
performance, but I ran 1500 mtu tests on either quite fast machines and
gige link or over very slow (3 mbyte/s) link).

sendpage() also perfroms less allocations and they are smaller than that
for sendmsg().

I'm actually surprised that in bulk transfer per-page sending is slower
than lots of pages in one go. Of course it should be faster, but
difference should be very small, since it is only a matter of socket lock
grab, which for bulk data sending should not be an issue at all.
At least in my tests I never had a difference and easily achieved wire
limit even for per-page sending with data copy (i.e. sendmsg()).

-- 
	Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html