Re: Unclear semantics of "sendfile"

"Markus Mottl" <markus.mottl@xxxxxxxxx> · Fri, 22 Sep 2006 22:12:45 -0400

On 9/22/06, Stephen Hemminger <shemminger@xxxxxxxx> wrote:
sendfile() as far as I know should behave the same as send().

Not quite: if I am not mistaken, "send" will always return if the send
buffer is full unless it was already so at the start of the system
call.  It depends on the blocking mode then whether it will return.
But "sendfile" may send all data with one call only, unless the socket
is non-blocking, or unless there is a timeout.  I'm not sure though
whether the latter is always true: e.g things might work differently
with network cards that don't support scatter/gather.

Furthermore, "send" allows passing some flags that modulate its
behaviour, which I cannot do with "sendfile".  E.g. instead of passing
the flag "MSG_DONTWAIT" to "send", I may need to perform a second
system call "fcntl" before "sendfile" to set the socket to
non-blocking mode (and eventually a third one afterwards to set it
back to blocking mode).

Partial send is not an error, it could be caused by premature
close from the other end as well. Therefore, the application should
not expect errno to be set. The application needs to do another
write/send/sendfile to force continuation and error detection.

That's true, but to my knowledge, it is completely impossible to
determine a socket timeout immediately this way.  This means that
there is no way to guarantee that it won't take at least twice as long
to timeout as configured by the SO_SNDTIMEO socket option.  In fact,
if not done correctly, it might even take three times as long!

The way I currently handle this (sendfile on sockets with timeouts) is
as follows:

 1) call sendfile -> returns with a partial result
 2) set socket to non-blocking mode
 3) call sendfile with rest of file - just to eventually fill send buffer!
 4) if there is still data to send, set socket back to blocking mode
 5) call sendfile with rest of rest to let it eventually time out again

If I didn't do 2) and 4), it could take three times the socket timeout
settings before I knew that it is really a timeout.  With the above
trick it "only" takes twice as long.  I also have to perform four
additional system calls here with "sendfile" (or even more if I want
to check whether the file was truncated!).  "send" is better here,
because I can specify "MSG_DONTWAIT" to switch to non-blocking mode
for just the intermediate call to fill the buffer.

I think in practice it may not be a big deal performance-wise to
perform some extra system calls, because the mentioned events
(timeouts, signals, truncated files) are presumably exceedingly rare
for the vast majority of applications.  But the code looks awful, and
some people also may not know how to handle this case correctly.  It
also took me a while to figure out that setting the socket to
non-blocking mode and forcing the send buffer to be filled helps.
And, most importantly, users may still be surprised to see that it
sometimes takes twice as long to time out than expected.

Abusing "errno" for the purpose of passing information back to the
caller might be a workaround, but then again it may also just confuse
other users, because they may not expect "errno" to behave that way.
Sigh.

Maybe somebody has an even better solution for the above problem, one
that preserves the timeout span with "sendfile"?  I'd be very
grateful...

Regards,
Markus

--
Markus Mottl        http://www.ocaml.info        markus.mottl@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html