On 9/22/06, Stephen Hemminger <shemminger@xxxxxxxx> wrote:
sendfile() as far as I know should behave the same as send().
Not quite: if I am not mistaken, "send" will always return if the send buffer is full unless it was already so at the start of the system call. It depends on the blocking mode then whether it will return. But "sendfile" may send all data with one call only, unless the socket is non-blocking, or unless there is a timeout. I'm not sure though whether the latter is always true: e.g things might work differently with network cards that don't support scatter/gather. Furthermore, "send" allows passing some flags that modulate its behaviour, which I cannot do with "sendfile". E.g. instead of passing the flag "MSG_DONTWAIT" to "send", I may need to perform a second system call "fcntl" before "sendfile" to set the socket to non-blocking mode (and eventually a third one afterwards to set it back to blocking mode).
Partial send is not an error, it could be caused by premature close from the other end as well. Therefore, the application should not expect errno to be set. The application needs to do another write/send/sendfile to force continuation and error detection.
That's true, but to my knowledge, it is completely impossible to determine a socket timeout immediately this way. This means that there is no way to guarantee that it won't take at least twice as long to timeout as configured by the SO_SNDTIMEO socket option. In fact, if not done correctly, it might even take three times as long! The way I currently handle this (sendfile on sockets with timeouts) is as follows: 1) call sendfile -> returns with a partial result 2) set socket to non-blocking mode 3) call sendfile with rest of file - just to eventually fill send buffer! 4) if there is still data to send, set socket back to blocking mode 5) call sendfile with rest of rest to let it eventually time out again If I didn't do 2) and 4), it could take three times the socket timeout settings before I knew that it is really a timeout. With the above trick it "only" takes twice as long. I also have to perform four additional system calls here with "sendfile" (or even more if I want to check whether the file was truncated!). "send" is better here, because I can specify "MSG_DONTWAIT" to switch to non-blocking mode for just the intermediate call to fill the buffer. I think in practice it may not be a big deal performance-wise to perform some extra system calls, because the mentioned events (timeouts, signals, truncated files) are presumably exceedingly rare for the vast majority of applications. But the code looks awful, and some people also may not know how to handle this case correctly. It also took me a while to figure out that setting the socket to non-blocking mode and forcing the send buffer to be filled helps. And, most importantly, users may still be surprised to see that it sometimes takes twice as long to time out than expected. Abusing "errno" for the purpose of passing information back to the caller might be a workaround, but then again it may also just confuse other users, because they may not expect "errno" to behave that way. Sigh. Maybe somebody has an even better solution for the above problem, one that preserves the timeout span with "sendfile"? I'd be very grateful... Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html