On Tue, Feb 04, 2014 at 04:52:20AM -0800, Kent Overstreet wrote: > I'm on vacation in Switzerland, didn't bring my adderall, and > direct-io.c makes my head hurt at the best of times, but - have a look > at my in-progress dio rewrite: > > http://evilpiepirate.org/git/linux-bcache.git/commit/?h=block_stuff&id=ca09c20f08efd640f255fabd778de0dbf43ed1da > > Where I'm headed with things is to just start out by allocating bios and > pinning pages into them, and _then_ doing all the fun "ask the > filesystem where it goes and what to do with it" dance. The goal is to > push the bios as far up the stack as possible. How far would that be? E.g. for something like NFS it would be completely wrong. AFAICS, what you are doing there isn't incompatible with what I described; bio_get_user_pages() would just use that primitive (in fact, the loop in it is a damn good starting point for implementation of that primitive for iovec-based instances of iov_iter). I'm not too fond of the names, TBH - it might make sense to rename iov_iter to something like mem_stream; maybe even leave iov_iter as-is for whatever users might remain, but for now I don't see any that would fundamentally depend on the thing being iovev-backed... And bio_vec is a bad misnomer - it's not related to block subsystem at all. Sure, it had originated there, but... Hell knows; by now it's probably too much PITA to rename (we have about half a thousand instances in the tree). Pity, that... I definitely don't buy "bio is a natural object for carrying an array of pieces of pages"; not sure if that's what you implied in earlier thread, but it has too much baggage from block subsystem *and* it lacks the things we may want to associate with individual elements of such array (starting with "how can I steal that page?" method). I'm not sure if you'd been reading that thread back when it started; my interest in that thing is mostly because I want to get rid of duplication (and inconsistencies) between ->aio_write() and ->splice_write(). I hadn't been watching the threads around iov_iter last year; hch has pointed to those when I proposed to use an object that could carry both iovec and (possibly extended) analog of bio_vec and make generic_file_aio_write() et.al. agnostic wrt what's behind that object. Then we could use the same method to implement both ->aio_write() and ->splice_write() in a lot of cases. iov_iter is a good starting point for such object, and for now I'm mostly doing stuff that encapsulates the knowledge of its guts (including "there's an iovec behind it"). Those cleanups aside (and they make sense on their own, regardless of where the rest goes), it might make sense to add a copy of struct iov_iter that would have a tagged union in it (originally just for iovec, with IOVEC_READ/IOVEC_WRITE as possible tags) and switch a bunch of places that do not look into the guts of iov_iter to that thing. I'm not sure if there will be any other places left (so far it looks like we'll be able to get away with a reasonable set of primitives), but... we'll see. For now the whole thing is fairly experimental and it will almost certainly be reordered, etc. quite a few times. I'm trying to keep the part of queue in vfs.git#iov_iter more or less stable (with a lot of stuff in flux sitting in the local one), but it's not at the state where I'd recommend merges from it; there will be rebases, etc. BTW, folks, any suggestions about the name of that "memory stream" thing? struct iov_iter really implies iterator for iovec and more generic name would probably be better... struct mem_stream would probably do if nobody comes up with better variant, but it's long and somewhat clumsy... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html