I truly appreciate all the answers. Makes sense!!
Most of my background is from systems where reducing (or even eliminating) memory copy by the CPU was the holy grail (using RDMA and other such techniques). I do realize that compared to all the other overheads in the network and OpenSSL path, we can argue that the extra memory copy isn't as prohibitive as it first seemed to me. I still think that in case OpenSSL turns out to use some hardware to offload the processing, then the extra memory copy might be noticeable, but I totally agree that in the general case this will not be an issue.
So, temporary buffer it is. Thanks you very much for all the help.
On Sun, Feb 2, 2020 at 6:59 PM Michael Wojcik <Michael.Wojcik@xxxxxxxxxxxxxx> wrote:
This has of course come up before - there was an energetic discussion on this list back in May 2001, and then again in August of that year. Eric Rescorla was one of the participants (as was I).
And the answer has always been that given the miniscule performance gain,[1] and portability issues for platforms that don't have scatter/gather I/O, no one has been motivated to implement it. The OpenSSL core team have better things to do; and clearly no one else has found it sufficiently rewarding to implement it, submit a pull request, and advocate for its inclusion in the official distribution.
OpenSSL is source, after all. There's nothing to stop anyone from adding SSL_writev to their own fork, testing the result, and submitting a PR.
Regarding the "many temporary buffers" problem - traditionally this has been solved with a buffer pool, such as the BSD mbuf architecture. A disadvantage of a single buffer pool is serialization for obtaining and releasing buffers; that can be relieved somewhat by using multiple buffer pools, with threads selecting a pool based on e.g. a hash of the thread ID. That gives you multiple, smaller lock domains.
[1] Yes, "wasted" cycles are wasted cycles. But by Amdahl's Law, optimizing a part of the system where performance is dominated by other considerations that are two or more orders of magnitude larger can never gain you even a single percentage point of improvement. Is it really that useful to improve your application's capacity from, say, 100,000 clients to 100,100? What's the value of that relative to the cost of implementing and testing a new API?
--
Michael Wojcik
Distinguished Engineer, Micro Focus