On Mon, Dec 03, 2012 at 08:41:25AM -0700, Christoph Hellwig wrote: > On Mon, Dec 03, 2012 at 08:37:20AM -0500, Josef Bacik wrote: > > Btrfs is terrible with O_DIRECT|O_SYNC, mostly because of the constant > > waiting. The thing is we have a handy way of waiting for IO that we can > > delay to the very last second so we do all of the O_SYNC work and then wait > > for a bunch of IO to complete. So introduce a flag to allow the generic > > direct io stuff to forgo waiting and leave that up to the file system. > > Thanks, > > I don't really like passing another flag for this, if we we are going to > do something like this it should be in a way where: > > - the actualy waiting code is a helper that btrfs would also use > - the main dio code is structured in a way that we have a lower level > entry point that skips the waiting, and a higher level one that also > calls it. > > That beeing said I'm not imaginative enough to see how you're actually > going to use it. Posting the btrfs side would help with that. > Hrm so I can do that, but it may not make much sense. Here are the two patches that are relevant (older versions but they get the idea across) http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=78b40072c556d82fac5e58793a3178887ac057ec http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=b7728f1b19eeb2041e3d4da22fd3d5a5c11abd3c Basically what happens with btrfs now in O_SYNC/fsync() with either O_DIRECT or not is this write() fsync()/O_SYNC start and wait on all io to complete log changed metadata into special tree write and wait on our new log sync super which points at our new log What I'm trying to accomplish is this write() fsync()/O_SYNC start io log changed metadata into special tree write log and then wait on log and data sync super this gives us a pretty great performance boost since we just have to wait the one time (well two if you include the super). But in the O_DIRECT case it always waits for writes to be completed before it returns to the file system. In normal O_DIRECT we want to do that, which is all the first patch does, waits for the IO like we normally would. But for fsync()/O_SYNC we want to forego the waiting until the last possible second, so we start io, gather up the ordered extents (what we use to track pending IO), and then when we're ready wait to make sure those ordered extents have completed. We already have our own helpers and such to keep track of when IO finishes for a given range, so all we really need is a flag to tell O_DIRECT not to do what it normally does since we will take care of it. I'm open to other ways to do this, but I'd rather not go to all the trouble to create new helpers and such that btrfs will just never need to use. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html