David S. Miller wrote: > > A variant on my problem is when using sendfile() to send from a > > shared-memory file. The idea is to prepare data in mapped shared > > memory, and then sendfile() it so that it doesn't have to be copied > > out from userspace. I haven't actually tried this, but I think it > > might improve performance where large blocks of data are involved, > > especially when the same data is sent on many sockets (e.g. cached > > generated web pages), and certainly would reduce memory usage. > > Yes, I've suggested this scheme to some people in the past. > It's a clean way to get zerocopy sendmsg(), the user does all > of the buffer management with some minimal help from the kernel. I'm not sure if it's cache-safe, on architectures which have non-coherent, virtually-indexed caches (like MIPS etc.). There is no explicit step after modifying the data in user-space which which updates the cached page in kernel-space/dma-space. Do you know about that? That's unlike sendmsg(), which does it at the time, and sendfile(), where it's done when the file is written before sendfile(). > > It has the same problem: when to recycle pages in userspace. > > I guess the trick to this problem is to find some way to > poll() for these events. One idea is to use a socket option > to tell TCP "don't signal POLLOUT until X". So you do all > of your writes, make the socket option call to say which buffer > you want to wait to be free, then poll(). That wouldn't work because the app also needs to know when it can write more data to the socket, independently of when it can recycle buffers. So it would need to be a separate POLLX flag. POLLWRBAND could probably be abused for it, if there was a desire to wedge it uncomfortably into the standard API. > Patches (especially tested ones with a real app) are welcome :-) I have a library which would be good at using this feature if it existed, and I'd be happy to use the feature and release the library to demonstrate it's use. However, I have neither the resources nor inclination to properly test the performance of such a feature. If anyone's interested, I'd be happy to collaborate on this. No other OS which has sendfile() (there are a lot) has a way to do check when buffers are freed, unfortunately - they don't even have SIOCOUTQ (not even FreeBSD). -- Jamie - : send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html