On Sat, Dec 08, 2012 at 05:35:41AM -0700, Chris Mason wrote: > On Sat, Dec 08, 2012 at 05:17:31AM -0700, Christoph Hellwig wrote: > > On Mon, Dec 03, 2012 at 11:14:03AM -0500, Josef Bacik wrote: > > > On Mon, Dec 03, 2012 at 08:41:25AM -0700, Christoph Hellwig wrote: > > > > On Mon, Dec 03, 2012 at 08:37:20AM -0500, Josef Bacik wrote: > > > > > Btrfs is terrible with O_DIRECT|O_SYNC, mostly because of the constant > > > > > waiting. The thing is we have a handy way of waiting for IO that we can > > > > > delay to the very last second so we do all of the O_SYNC work and then wait > > > > > for a bunch of IO to complete. So introduce a flag to allow the generic > > > > > direct io stuff to forgo waiting and leave that up to the file system. > > > > > Thanks, > > > > > > > > I don't really like passing another flag for this, if we we are going to > > > > do something like this it should be in a way where: > > > > > > > > - the actualy waiting code is a helper that btrfs would also use > > > > - the main dio code is structured in a way that we have a lower level > > > > entry point that skips the waiting, and a higher level one that also > > > > calls it. > > > > > > > > That beeing said I'm not imaginative enough to see how you're actually > > > > going to use it. Posting the btrfs side would help with that. > > > > > > > > > > Hrm so I can do that, but it may not make much sense. Here are the two patches > > > that are relevant (older versions but they get the idea across) > > > > > > http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=78b40072c556d82fac5e58793a3178887ac057ec > > > http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=b7728f1b19eeb2041e3d4da22fd3d5a5c11abd3c > > > > I've looked over the patches but I still don't know what's going on, > > sorry for having to poke a bit deeper by mail. > > > > > > > > Basically what happens with btrfs now in O_SYNC/fsync() with either O_DIRECT or > > > not is this > > > > > > write() > > > fsync()/O_SYNC > > > start and wait on all io to complete > > > log changed metadata into special tree > > > write and wait on our new log > > > sync super which points at our new log > > > > > > What I'm trying to accomplish is this > > > > > > write() > > > fsync()/O_SYNC > > > start io > > > log changed metadata into special tree > > > write log and then wait on log and data > > > > How is going to be safe? You must only update the metadata once the > > data has made it to disk, that is the actual disk I/O for the metadata > > must only start once the disk I/O for the data has finished. For > > exactly that scenario the direct I/O code supports the end_io callback > > to notify the filesystem efficiently. > > Thanks for reading through things. The current model without the patch > looks like this: > > [ write data, wait for data ] [ write various tree blocks, wait ] > [ write the super, wait ] > > One data block, 3 waits. But thanks to cow, the super commits the > metadata, so we could do this: > > [ write the data ] [ write various tree blocks ] [ wait on all of it ] > [ write the super, wait ] > > That's down to two waits. If we start using atomic writes on flash, we can > do it all as a single IO. So I have this (Josef's v2) in a branch here. I'm happy to wait a kernel release if we'd like to hash it out. But if it makes sense I'll send in with my pull request. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html