On Mon, Feb 02, 2015 at 08:11:12AM +0000, Al Viro wrote: > On Mon, Feb 02, 2015 at 09:07:29AM +0100, Christoph Hellwig wrote: > > On Sat, Jan 31, 2015 at 06:08:41AM +0000, Al Viro wrote: > > > and be done with that. I'll put drivers/usb/gadget patches into a stable > > > branch and ask use folks to pull from it - that's the simplest of this > > > series, actually... > > > > use folks? Btw, does this mean you patches to switch it over, or do > > usb. > > > you want me to finish my conversion. > > I have f_fs.c conversion and partial legacy/inode.c one. Will post tomorrow, > about to fall asleep right now... FWIW, here's the current situation with read/write methods: 1) a bunch of file_operations instances have only ->read() and/or ->write(); no ->aio_... or ->..._iter versions, no ->splice_write(). Those are basically device drivers: readv is equivalent to loop over segments, with read on each. writev is equivalent to loop over segments, with write on each. AIO read and write are all synchronous splice_write is does kmap on each buffer in turn and write that Note that splitting a buffer into several adjacent pieces and submitting a vectored IO is *not* guaranteed to be equivalent to plain IO on the entire buffer - it has datagram-like semantics and boundaries do matter. 2) regular ->read_iter/->write_iter users: ->read is new_sync_read() (or NULL, if ->read_iter is NULL) ->write is new_sync_write() (or NULL, if ->write_iter is NULL) ->aio_read and ->aio_write are NULL ->splice_write is iter_splice_write (or NULL, if ->write_iter is NULL) Those are stream-style ones; what matter is concatenation of the buffers, not the boundaries. Right now the following is true: ->read_iter/->write_iter on a sync kiocb never returns EIOCBQUEUED and iterator argument is advanced exactly by the amount of data transferred. 3) ones like (2), but with NULL ->splice_write() in spite of present ->write_iter(). AFAICS they can be bulk-converted to (2). This is what the bulk of irregularities boil down to. 4) ones like (2) or (3), except ->read() and ->write() are left NULL instead of doing new_sync_{read,write}(). Mostly equivalent to (2); some codepaths call ->read() or ->write() directly, instead of going through vfs_... wrappers, and for those it can be a bit of a headache. In any case, there are very few such instances (only 3) and they are trivial to convert to (2). 5) Two instances in fs/9p have ->write_iter() *and* ->write(), the latter not being new_sync_write(). Ditto for ->read_iter() and ->read(). No other instance has such combinations. FWIW, they are *almost* regular - their ->read() and ->write() are new_sync_... unless it's an O_DIRECT file. It might make sense to try and convert those to ->direct_IO(); as it is, their O_DIRECT is ignored for readv()/writev(), which is arguably a bug. 6) drivers/char/mem.c stuff; they are equivalent to (2), but somewhat optimised. Some of that might make sense to convert to (2); there _are_ hot paths involved, so we'd better be careful there. 7) bad_file_ops. It is equivalent to (2); it's also a very special case. FWIW, I'm not at all sure that we *need* most of the methods in there. We never replace ->f_op of an opened file with that, so if we end up with that sucker, it happens on fresh open of something that had its ->i_fop replaced with bad_file_ops. And that will instantly fail in bad_file_open(). Why do we need the rest of the methods in there? AFAICS, we should drop all but ->open(). 8) hypfs - AFAICS, converts to (2) easily; I'll do that. 9) FUSE - ->aio_read/->aio_write user, with unusual ->splice_write as well. I _think_ it can be converted to (2), but that'll take a bit of work. 10) sockets; conversion to ->read_iter/->write_iter had been posted to netdev, ->splice_write() (the only user of ->sendpage(), BTW) is a work for the next cycle. 11) NTFS with rw support enabled. ->aio_write() user, needs to be converted to ->write_iter(). AFAICS, it wasn't particularly high priority for Anton (along with all rw stuff in fs/ntfs); it doesn't look terribly hard, but then it wasn't a high priority for me either. 12) virtio_console. Interesting ->splice_write() there; hadn't looked deep enough. 13) two odd drivers/usb/gadget instances. I have conversion for f_fs.c, but legacy/inode.c (ep_read() et.al.) is trickier. The problem in there is that writev() on a single-element vector is *not* equivalent to plain write(). The former treats the wrong-direction endpoint as EINVAL; the latter does if (usb_endpoint_xfer_isoc(&data->desc)) { mutex_unlock(&data->lock); return -EINVAL; } DBG (data->dev, "%s halt\n", data->name); spin_lock_irq (&data->dev->lock); if (likely (data->ep != NULL)) usb_ep_set_halt (data->ep); spin_unlock_irq (&data->dev->lock); mutex_unlock(&data->lock); return -EBADMSG; instead. IOW, for isochronous endpoints behaviour is the same, but the rest behaves differently. If not for that, that sucker would convert to (3) easily; ->splice_write() semantics is potentially interesting - the question is where do we want transfer boundaries. As it is, writev() collects the entire iovec and shoves it into single transfer; splice() to that thing ends up with each pipe_buf going in a separate transfer, mergable or not. I would really appreciate comments on semantics of that thing from USB folks... 14) ipathfs and qibfs: seriously different semantics for write and writev/AIO write. As in "different set of commands recognized"; AIO write plays like writev, whether it's vectored or not (and it's always synchronous). I've no idea who had come up with that... highly innovative API or why hadn't they simply added two files (it's all on their virtual filesystem, so they had full control of layout) rather that multiplexing two different command sets in such a fashion. 15) /dev/snd/pcmC*D*[cp]. Again, different semantics for write and writev, with the latter wanting nr_seqs equal to the number of channels. AIO non-vectored write fails unless there's only one channel. Not sure how ALSA userland uses that thing; AIO side is always synchronous, so it might be simply never used. FWIW, I'm not sure that write() on a single-channel one is equivalent to 1-element writev() - silencing-related logics seem to differ. That's it. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html