On Fri, Feb 17, 2012 at 11:11:37PM +0530, Anand Avati wrote: > On 02/17/2012 09:32 PM, Josef Bacik wrote: > >On Fri, Feb 17, 2012 at 08:49:58AM -0500, Anand Avati wrote: > >>Implement ->direct_IO() method in aops. The ->direct_IO() method combines > >>the existing fuse_direct_read/fuse_direct_write methods to implement > >>O_DIRECT functionality. > >> > >>Reaching ->direct_IO() in the read path via generic_file_aio_read ensures > >>proper synchronization with page cache with its existing framework. > >> > >>Reaching ->direct_IO() in the write path via fuse_file_aio_write is made > >>to come via generic_file_direct_write() which makes it play nice with > >>the page cache w.r.t other mmap pages etc. > >> > >>On files marked 'direct_io' by the filesystem server, IO always follows > >>the fuse_direct_read/write path. There is no effect of fcntl(O_DIRECT) > >>and it always succeeds. > >> > >>On files not marked with 'direct_io' by the filesystem server, the IO > >>path depends on O_DIRECT flag by the application. This can be passed > >>at the time of open() as well as via fcntl(). > >> > >>Note that asynchronous O_DIRECT iocb jobs are completed synchronously > >>always (this has been the case with FUSE even before this patch) > >> > >>Signed-off-by: Anand Avati<avati@xxxxxxxxxx> > >>--- > >> > >>Test case: > >> > >>- concurrent read and write DDs with oflag=direct and iflag=direct set > >> in a few writes and a few reads > >> > >>- artificially decrease 6th parameter to generic_file_direct_write to > >> simulate a partial write of the direct IO request and verify proper > >> buffered I/O for remaining offset and size with printk's and verify > >> write completion at backend > >> > >> fs/fuse/dir.c | 3 - > >> fs/fuse/file.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++-------- > >> 2 files changed, 107 insertions(+), 20 deletions(-) > >> > > > >All in all it looks good, just one little nit > > > >>diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > >>index 2066328..7e5dbd0 100644 > >>--- a/fs/fuse/dir.c > >>+++ b/fs/fuse/dir.c > >>@@ -387,9 +387,6 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry, > >> if (fc->no_create) > >> return -ENOSYS; > >> > >>- if (flags& O_DIRECT) > >>- return -EINVAL; > >>- > >> forget = fuse_alloc_forget(); > >> if (!forget) > >> return -ENOMEM; > >>diff --git a/fs/fuse/file.c b/fs/fuse/file.c > >>index 4a199fd..9dd611b 100644 > >>--- a/fs/fuse/file.c > >>+++ b/fs/fuse/file.c > >>@@ -194,10 +194,6 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) > >> struct fuse_conn *fc = get_fuse_conn(inode); > >> int err; > >> > >>- /* VFS checks this, but only _after_ ->open() */ > >>- if (file->f_flags& O_DIRECT) > >>- return -EINVAL; > >>- > >> err = generic_file_open(inode, file); > >> if (err) > >> return err; > >>@@ -932,17 +928,23 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov, > >> struct file *file = iocb->ki_filp; > >> struct address_space *mapping = file->f_mapping; > >> size_t count = 0; > >>+ size_t ocount = 0; > >> ssize_t written = 0; > >>+ ssize_t written_buffered = 0; > >> struct inode *inode = mapping->host; > >> ssize_t err; > >> struct iov_iter i; > >>+ loff_t endbyte = 0; > >> > >> WARN_ON(iocb->ki_pos != pos); > >> > >>- err = generic_segment_checks(iov,&nr_segs,&count, VERIFY_READ); > >>+ ocount = 0; > >>+ err = generic_segment_checks(iov,&nr_segs,&ocount, VERIFY_READ); > >> if (err) > >> return err; > >> > >>+ count = ocount; > >>+ > >> mutex_lock(&inode->i_mutex); > >> vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE); > >> > >>@@ -962,11 +964,36 @@ static ssize_t fuse_file_aio_write(struct kiocb *iocb, const struct iovec *iov, > >> > >> file_update_time(file); > >> > >>- iov_iter_init(&i, iov, nr_segs, count, 0); > >>- written = fuse_perform_write(file, mapping,&i, pos); > >>- if (written>= 0) > >>- iocb->ki_pos = pos + written; > >>+ if (file->f_flags& O_DIRECT) { > >>+ written = generic_file_direct_write(iocb, iov,&nr_segs, > >>+ pos,&iocb->ki_pos, > >>+ count, ocount); > >>+ if (written< 0 || written == count) > >>+ goto out; > >>+ > >>+ pos += written; > >>+ count -= written; > >>+ > >>+ iov_iter_init(&i, iov, nr_segs, count, written); > >>+ written_buffered = fuse_perform_write(file, mapping,&i, pos); > >>+ if (written_buffered< 0) { > >>+ err = written_buffered; > >>+ goto out; > >>+ } > >>+ endbyte = pos + written_buffered - 1; > >> > >>+ err = filemap_write_and_wait_range(file->f_mapping, pos, > >>+ endbyte); > >>+ if (err) > >>+ goto out; > >>+ written += written_buffered; > >>+ iocb->ki_pos = pos + written_buffered; > > > >You need to call invalidate_mapping_pages here to evict the pages from > >pagecache, basically copy what __generic_file_aio_write does and you are good to > >go. Thanks, > > I have a more fundamental question. Does FUSE require such a > "buffered fallback" path at all? Both branches > (generic_file_direct_write on a fuse file and fuse_perform_write) > finally end up in a fuse_send_write(). What is the reason behind > expecting fuse_send_write() to fail via fuse_direct_IO() branch but > succeed via fuse_perform_write()? Ahh good point helps if I pay attention to what you are doing. You're right if it fails in the direct write I doubt trying it again with buffered writing is going to get you anywhere, I had it in my mind you were still using the generic dio infrastructure which can fail for a variety of reasons but you aren't so you could probably get away with just failing if generic_file_direct_write fails. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html