At the risk of further muddying the waters, there's another minor tweak that could improve performance on certain workloads. Currently you mmap() a range for a given socket and then getsockopt() to receive. If you made it so you could mmap() something once for any number of sockets (by mmapping /dev/misc/tcp_zero_receive or whatever), then the performance of the getsockopt() bit would be identical, but you could release the mapping for many sockets at once with only a single flush. For some use cases, this could be a big win. You could also add this later easily enough, too.