On Fri, Jan 17, 2014 at 11:22:04PM -0800, Linus Torvalds wrote: > On Fri, Jan 17, 2014 at 10:40 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > Objections, comments? > > I certainly object to the "map, then unmap" approach. No VM games. Um... int pipe_to_file(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) ... if (buf->page != page) { char *src = buf->ops->map(pipe, buf, 1); char *dst = kmap_atomic(page); memcpy(dst + offset, src + buf->offset, this_len); flush_dcache_page(page); kunmap_atomic(dst); buf->ops->unmap(pipe, buf, src); } ... ->map() and ->unmap() (BTW, why are those methods, anyway? They are identical for all instances) are void *generic_pipe_buf_map(struct pipe_inode_info *pipe, struct pipe_buffer *buf, int atomic) { if (atomic) { buf->flags |= PIPE_BUF_FLAG_ATOMIC; return kmap_atomic(buf->page); } return kmap(buf->page); } and void generic_pipe_buf_unmap(struct pipe_inode_info *pipe, struct pipe_buffer *buf, void *map_data) { if (buf->flags & PIPE_BUF_FLAG_ATOMIC) { buf->flags &= ~PIPE_BUF_FLAG_ATOMIC; kunmap_atomic(map_data); } else kunmap(buf->page); } resp. If we are going to copy that data (and all users of generic_file_splice_write() do that memcpy() to page cache), we have to kmap the source ;-/ > But if it can be done more naturally as a writev, then that may well > be ok. As long as we're talking about just the > default_file_splice_write() case, and people who want to do special > things with page movement can continue to do so.. The thing is, after such change default_file_splice_write() is no worse than generic_file_splice_write(). The only instances that really want something else are the ones that try to steal pages (e.g. virtio_console, fuse miscdev) or sockets, with their "do DMA from the sodding page, don't copy it at anywhere" ->sendpage() method. IOW, ones those special things you are talking about. Normal filesystems do not - not on pipe-to-file splice. file-to-pipe - sure, that one plays with pagecache and tries hard to do zero-copy, but that's ->splice_read(), not ->splice_write()... _If_ somebody figures out how to deal with zero-copy on pipe-to-file - fine, we'll be able to revisit that. But there hadn't been one since 2007 and there was zero activity in that area, so... What I'm doing right now is taking do_readv_writev() apart and making the stuff after rw_copy_check_uvector() non-static (visible in fs/internal.h). As long as we do not go through rw_copy_check_uvector() (we'd just built that iovec ourselves and it's already in kernel space), we should be fine - single copy done straight to pagecache, with whatever locks fs wants to take, etc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html