Hi Christoph, On Fri, Mar 09, 2007 at 10:39:13AM +0000, Christoph Hellwig wrote: > Hi Nick, > > sorry for my later reply, this has been on my to answer list for the last > month and I only managed to get back to it now. No worries, I haven't had much time to work on it since then anyway. Thanks for taking a look. > On Thu, Feb 08, 2007 at 02:07:36PM +0100, Nick Piggin wrote: > > as a single call to copy a given amount of userdata at the given offset. This > > is more flexible, because the implementation can determine how to best handle > > errors, or multi-page ranges (eg. it may use a gang lookup), and only requires > > one call into the fs. > > I really like this idea, especially for avoiding to call into the allocator > for every block. Have you contacted the reiser4 folks whether this would > superceed their batch_write op completely? I haven't yet, although that's been on my todo list when I get the API into a more final state. batch_write seems quite similar, however theirs is still page based, and a bit crufty, IMO. I found it to be really clean to just pass down offsets, but that may be a matter for debate. What they _do_ have is a write actor function that will do the data copy. This could be one possible way to get rid of ->prepare_write and ->commit_write, but I haven't tried that yet, because I don't like adding more redirection and complexity if possible... > > One problem with this interface is that it cannot be used to write into the > > filesystem by any means other than already-initialised buffers via iovecs. So > > prepare/commit have to stay around for non-user data... > > Actually I think that's a a good thing to a certain extent. It reminds > us that all other users are horrible abuse of the interface. I'd even > go so far as to make batch_write a callback that the filesystem passes > to generic_file_aio_write to make clear it's not a generic thing but > a helper. (It's not a generic thing because it's the upper layer writing > into the pagecache, not a pagecache to fs below operation). OK, if you think that's reasonable, then that is one hurdle out of the way ;) > The still leaves open on how to get rid of ->prepare_write and ->commit_write > compltely, and for that we'll probably need ->kernel_read and ->kernel_write > file operations. But that's a step you shouldn't consider yet when doing > this work. I had a couple of possibilities for that. First is passing in a write actor (eg. defaulting to the normal iovec usercopy), but as I said I consider this more like fixing the problem with brute force (ie. just making the interface more complex). Maybe as a last resort, though. Another thing that would be much nicer from _my_ point of view would be to just make all kernel users set up their data in an iovec, and use the normal call with KERNEL_DS. Unfortunately, this is not the expected way for a lot of code to work, and it might require extra copying of the data. > > Another thing is that it seems to be less able to be implemented in generic, > > reusable code. It should be possible to introduce a new 2-op interface (or > > maybe just a new error handler op) which can be used correctly in generic code. > > We should be able to find a nice abstraction for this, see my next mails. > > > + /* > > + * perform_write replaces prepare and commit_write callbacks. > > + */ > > This is a rather useless comment :) Better remove it and add a proper > descriptions to Documentation/filesystems/vfs.txt and > Documentation/filesystems/Locking Will do. Thanks! - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html