On Wed, 3 Nov 2010 07:14:56 -0400 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Tue, 2 Nov 2010 20:59:00 +0300 > Pavel Shilovsky <piastryyy@xxxxxxxxx> wrote: > > > 2010/11/2 Jeff Layton <jlayton@xxxxxxxxxx>: > > > On Tue, 2 Nov 2010 12:02:24 +0300 > > > Pavel Shilovsky <piastryyy@xxxxxxxxx> wrote: > > > > > >> Modify cifs_file_aio_write and cifs_write_end to let the client works with > > >> strict cache mode. > > >> > > >> Signed-off-by: Pavel Shilovsky <piastryyy@xxxxxxxxx> > > >> --- > > >> fs/cifs/cifsfs.c | 35 ++++++++++++++++++++++++++++++----- > > >> fs/cifs/file.c | 14 ++++++++++++-- > > >> 2 files changed, 42 insertions(+), 7 deletions(-) > > >> > > >> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c > > >> index 1b44a92..e1ecd35 100644 > > >> --- a/fs/cifs/cifsfs.c > > >> +++ b/fs/cifs/cifsfs.c > > >> @@ -602,12 +602,37 @@ static ssize_t cifs_file_aio_read(struct kiocb > > >> *iocb, const struct iovec *iov, > > >> static ssize_t cifs_file_aio_write(struct kiocb *iocb, const struct iovec *iov, > > >> unsigned long nr_segs, loff_t pos) > > >> { > > >> - struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode; > > >> - ssize_t written; > > >> + struct inode *inode; > > >> + struct cifs_sb_info *cifs_sb; > > >> + ssize_t written, cache_written; > > >> + loff_t saved_pos; > > >> + > > >> + inode = iocb->ki_filp->f_path.dentry->d_inode; > > >> + > > >> + if (CIFS_I(inode)->clientCanCacheAll) > > >> + return generic_file_aio_write(iocb, iov, nr_segs, pos); > > >> + > > >> + cifs_sb = CIFS_SB(iocb->ki_filp->f_path.dentry->d_sb); > > >> + > > >> + if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_STRICT_IO) == 0) { > > >> + written = generic_file_aio_write(iocb, iov, nr_segs, pos); > > >> + filemap_write_and_wait(inode->i_mapping); > > > ^^^^^^^^^^^^^^^^^^ > > > You can't ignore the return code from this. That function > > > may return an error if writeback fails. Also, I don't > > > see any need to wait on the result in this case. Why > > > not just kick off the I/O and return (do a > > > filemap_fdatawrite, IOW). > > > > I don't change non-strict variant for writing - we have this code now > > in the git tree. But I agree - we should think about return value in > > this case. > > > > > > > >> + return written; > > >> + } > > >> + > > >> + saved_pos = pos; > > >> + written = cifs_user_write(iocb->ki_filp, iov->iov_base, > > >> + iov->iov_len, &pos); > > >> + > > >> + if (written > 0) { > > >> + cache_written = generic_file_aio_write(iocb, iov, > > >> + nr_segs, saved_pos); > > >> + if (cache_written != written) > > >> + cERROR(1, "Cache written and server written data " > > >> + "lengths are different"); > > >> + } else > > >> + iocb->ki_pos = pos; > > >> > > > ^^^^^ > > > This seems awfully complicated. Why not just do a > > > generic_file_aio_write to get this into the cache and then just do a > > > filemap_write_and_wait and deal with the result? > > > > The main reason of doing this is mandatory byte-range locks. If we > > simply do generic_file_aio_write and then filemap_write_and_wait we > > can fail in the following situation: > > 1) process1 opens file and sets a mandatory lock from 0 to 1. > > 2) process2 opens file and writes a data from 1 to 2. > > > > If we do like you suggest we fail on page writing (on > > filemap_write_and_wait which writes whole page - from 0 to 2). That's > > why I do cifs_user_write which writes the data from 1 to 2 (it's what > > we need) and then store it in the cache by generic_file_aio_write > > (with the little change in cifs_write_end that doesn't write the same > > data twice to the server). So, right working of write and read ops > > with mandatory locks is one of the reasons to provide srict cache > > semantic. > > > > Why store it in the cache at all at that point then? Is that for mmap? > > This seems really ugly and doesn't pass the "sniff test" -- something > smells foul here. There must be a better way to handle this sort of > thing... > To elaborate... Rather than calling down into generic_file_aio_write, I think you'd be better served by simply invalidating the pages in the range that the write touched, or possibly just invalidating the entire cached inode. Also, I still haven't seen a description of what the semantics for mmap will be in this case. If I'm using strict caching and mmap a file, it obviously isn't going to read/write through every time userspace touches the memory. What can I expect to happen when I read or write to that mmap? How can I ensure that new data will be faulted in or data that I write will be synced out? This needs to be settled before we can consider merging this code. -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html