+ fs-prepare_write-fixes.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled

     fs: prepare_write fixes

has been added to the -mm tree.  Its filename is

     fs-prepare_write-fixes.patch

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

------------------------------------------------------
Subject: fs: prepare_write fixes
From: Nick Piggin <nickpiggin@xxxxxxxxxxxx>

Some prepare/commit_write implementations have possible pre-existing bugs,
possible data leaks (by setting uptodate too early) and data corruption (by
not reading in non-modified parts of a !uptodate page).

Others are (also) broken when commit_write passes in a 0 length commit with
a !uptodate page (a change caused by buffered write deadlock fix patch).

Fix filesystems as best we can.  GFS2, OCFS2, Reiserfs, JFFS are nontrivial
and are likely broken.  All others at least need a glance.

Also, update vfs.txt to describe the prepare_write/commit_write changes.

Signed-off-by: Nick Piggin <npiggin@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
---

 Documentation/filesystems/vfs.txt |   28 ++++++++----
 fs/affs/file.c                    |    4 +
 fs/cifs/file.c                    |   43 ++++++-------------
 fs/ecryptfs/mmap.c                |   17 +++----
 fs/ext3/inode.c                   |   63 +++++++++++++++-------------
 fs/fat/inode.c                    |    6 ++
 fs/fuse/file.c                    |    3 +
 fs/hostfs/hostfs_kern.c           |   13 +++++
 fs/jffs/inode-v23.c               |   11 ++++
 fs/jffs2/file.c                   |    3 +
 fs/nfs/file.c                     |    3 +
 fs/smbfs/file.c                   |    9 ++++
 fs/udf/file.c                     |    7 +++
 13 files changed, 130 insertions(+), 80 deletions(-)

diff -puN Documentation/filesystems/vfs.txt~fs-prepare_write-fixes Documentation/filesystems/vfs.txt
--- a/Documentation/filesystems/vfs.txt~fs-prepare_write-fixes
+++ a/Documentation/filesystems/vfs.txt
@@ -607,22 +607,32 @@ struct address_space_operations {
   	the given range of bytes is about to be written.  The
   	address_space should check that the write will be able to
   	complete, by allocating space if necessary and doing any other
-  	internal housekeeping.  If the write will update parts of
-  	any basic-blocks on storage, then those blocks should be
-  	pre-read (if they haven't been read already) so that the
+  	internal housekeeping.  If the write will *not* update any
+	parts of any basic-blocks of the page, then those blocks should
+	be pre-read (if they are not already uptodate) so that the
   	updated blocks can be written out properly.
 	The page will be locked.  If prepare_write wants to unlock the
   	page it, like readpage, may do so and return
   	AOP_TRUNCATED_PAGE.
 	In this case the prepare_write will be retried one the lock is
-  	regained.
+  	regained.a
+	prepare_write should *not* set the page uptodate unless it has
+	actually been filled with valid data (eg. that may be returned
+	with a read(2) syscall).
 
   commit_write: If prepare_write succeeds, new data will be copied
-        into the page and then commit_write will be called.  It will
-        typically update the size of the file (if appropriate) and
-        mark the inode as dirty, and do any other related housekeeping
-        operations.  It should avoid returning an error if possible -
-        errors should have been handled by prepare_write.
+        into the page and then commit_write will be called. commit_write may
+	be called with a range that is smaller than that passed in to
+	prepare_write, it could even be zero. If the page is not uptodate,
+	the range will *only* be zero or the full length that was passed to
+	prepare_write, if it is zero, the page should not be marked uptodate
+	(success should still be returned, if possible -- the write will be
+	retried).
+	commit_write will typically update the size of the file (if
+	appropriate) and mark the inode as dirty, and do any other related
+	housekeeping operations. A successful commit_write should mark the page
+	uptodate if it is not already. It should avoid returning an error if
+	possible - errors should have been handled by prepare_write.
 
   bmap: called by the VFS to map a logical block offset within object to
   	physical block number. This method is used by the FIBMAP
diff -puN fs/affs/file.c~fs-prepare_write-fixes fs/affs/file.c
--- a/fs/affs/file.c~fs-prepare_write-fixes
+++ a/fs/affs/file.c
@@ -655,6 +655,10 @@ static int affs_commit_write_ofs(struct 
 	int written;
 
 	pr_debug("AFFS: commit_write(%u, %ld, %d, %d)\n", (u32)inode->i_ino, page->index, from, to);
+
+	if (to - from == 0)
+		return 0;
+
 	bsize = AFFS_SB(sb)->s_data_blksize;
 	data = page_address(page);
 
diff -puN fs/cifs/file.c~fs-prepare_write-fixes fs/cifs/file.c
--- a/fs/cifs/file.c~fs-prepare_write-fixes
+++ a/fs/cifs/file.c
@@ -1365,6 +1365,9 @@ static int cifs_commit_write(struct file
 	loff_t position = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
 	char *page_data;
 
+	if (to - offset == 0)
+		return 0;
+
 	xid = GetXid();
 	cFYI(1, ("commit write for page %p up to position %lld for %d", 
 		 page, position, to));
@@ -1418,8 +1421,6 @@ static int cifs_commit_write(struct file
 			rc = 0;
 		/* else if (rc < 0) should we set writebehind rc? */
 		kunmap(page);
-	} else {	
-		set_page_dirty(page);
 	}
 
 	FreeXid(xid);
@@ -1973,36 +1974,20 @@ int is_size_safe_to_change(struct cifsIn
 static int cifs_prepare_write(struct file *file, struct page *page,
 	unsigned from, unsigned to)
 {
-	int rc = 0;
         loff_t offset = (loff_t)page->index << PAGE_CACHE_SHIFT;
 	cFYI(1, ("prepare write for page %p from %d to %d",page,from,to));
-	if (!PageUptodate(page)) {
-	/*	if (to - from != PAGE_CACHE_SIZE) {
-			void *kaddr = kmap_atomic(page, KM_USER0);
-			memset(kaddr, 0, from);
-			memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-			flush_dcache_page(page);
-			kunmap_atomic(kaddr, KM_USER0);
-		} */
-		/* If we are writing a full page it will be up to date,
-		   no need to read from the server */
-		if ((to == PAGE_CACHE_SIZE) && (from == 0))
-			SetPageUptodate(page);
-
-		/* might as well read a page, it is fast enough */
-		if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
-			rc = cifs_readpage_worker(file, page, &offset);
-		} else {
-		/* should we try using another file handle if there is one -
-		   how would we lock it to prevent close of that handle
-		   racing with this read?
-		   In any case this will be written out by commit_write */
-		}
-	}
 
-	/* BB should we pass any errors back? 
-	   e.g. if we do not have read access to the file */
-	return 0;
+	if (PageUptodate(page))
+		return 0;
+	if (to - from == PAGE_CACHE_SIZE)
+		return 0;
+
+	/* might as well read a page, it is fast enough */
+	if ((file->f_flags & O_ACCMODE) != O_WRONLY)
+		return cifs_readpage_worker(file, page, &offset);
+
+	printk("CIFS does not yet support partial page writes on O_WRONLY files\n");
+	return -EIO;
 }
 
 const struct address_space_operations cifs_addr_ops = {
diff -puN fs/ecryptfs/mmap.c~fs-prepare_write-fixes fs/ecryptfs/mmap.c
--- a/fs/ecryptfs/mmap.c~fs-prepare_write-fixes
+++ a/fs/ecryptfs/mmap.c
@@ -337,16 +337,12 @@ out:
 static int ecryptfs_prepare_write(struct file *file, struct page *page,
 				  unsigned from, unsigned to)
 {
-	int rc = 0;
-
 	kmap(page);
-	if (from == 0 && to == PAGE_CACHE_SIZE)
-		goto out;	/* If we are writing a full page, it will be
-				   up to date. */
-	if (!PageUptodate(page))
-		rc = ecryptfs_do_readpage(file, page, page->index);
-out:
-	return rc;
+	if (PageUptodate(page))
+		return 0;
+	if (to - from == PAGE_CACHE_SIZE)
+		return 0;
+	return ecryptfs_do_readpage(file, page, page->index);
 }
 
 int ecryptfs_grab_and_map_lower_page(struct page **lower_page,
@@ -634,6 +630,9 @@ static int ecryptfs_commit_write(struct 
 	struct ecryptfs_crypt_stat *crypt_stat;
 	int rc;
 
+	if (to - from == 0)
+		return 0;
+
 	inode = page->mapping->host;
 	lower_inode = ecryptfs_inode_to_lower(inode);
 	lower_file = ecryptfs_file_to_lower(file);
diff -puN fs/ext3/inode.c~fs-prepare_write-fixes fs/ext3/inode.c
--- a/fs/ext3/inode.c~fs-prepare_write-fixes
+++ a/fs/ext3/inode.c
@@ -1214,21 +1214,24 @@ static int ext3_ordered_commit_write(str
 	struct inode *inode = page->mapping->host;
 	int ret = 0, ret2;
 
-	ret = walk_page_buffers(handle, page_buffers(page),
-		from, to, NULL, ext3_journal_dirty_data);
+	if (to - from > 0) {
+		ret = walk_page_buffers(handle, page_buffers(page),
+			from, to, NULL, ext3_journal_dirty_data);
 
-	if (ret == 0) {
-		/*
-		 * generic_commit_write() will run mark_inode_dirty() if i_size
-		 * changes.  So let's piggyback the i_disksize mark_inode_dirty
-		 * into that.
-		 */
-		loff_t new_i_size;
+		if (ret == 0) {
+			/*
+			 * generic_commit_write() will run mark_inode_dirty()
+			 * if i_size changes.  So let's piggyback the
+			 * i_disksize mark_inode_dirty into that.
+			 */
+			loff_t new_i_size;
 
-		new_i_size = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
-		if (new_i_size > EXT3_I(inode)->i_disksize)
-			EXT3_I(inode)->i_disksize = new_i_size;
-		ret = generic_commit_write(file, page, from, to);
+			new_i_size = ((loff_t)page->index << PAGE_CACHE_SHIFT)
+									+ to;
+			if (new_i_size > EXT3_I(inode)->i_disksize)
+				EXT3_I(inode)->i_disksize = new_i_size;
+			ret = generic_commit_write(file, page, from, to);
+		}
 	}
 	ret2 = ext3_journal_stop(handle);
 	if (!ret)
@@ -1268,23 +1271,25 @@ static int ext3_journalled_commit_write(
 	int partial = 0;
 	loff_t pos;
 
-	/*
-	 * Here we duplicate the generic_commit_write() functionality
-	 */
-	pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+	if (to - from > 0) {
+		/*
+		 * Here we duplicate the generic_commit_write() functionality
+		 */
+		pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
 
-	ret = walk_page_buffers(handle, page_buffers(page), from,
-				to, &partial, commit_write_fn);
-	if (!partial)
-		SetPageUptodate(page);
-	if (pos > inode->i_size)
-		i_size_write(inode, pos);
-	EXT3_I(inode)->i_state |= EXT3_STATE_JDATA;
-	if (inode->i_size > EXT3_I(inode)->i_disksize) {
-		EXT3_I(inode)->i_disksize = inode->i_size;
-		ret2 = ext3_mark_inode_dirty(handle, inode);
-		if (!ret)
-			ret = ret2;
+		ret = walk_page_buffers(handle, page_buffers(page), from,
+					to, &partial, commit_write_fn);
+		if (!partial)
+			SetPageUptodate(page);
+		if (pos > inode->i_size)
+			i_size_write(inode, pos);
+		EXT3_I(inode)->i_state |= EXT3_STATE_JDATA;
+		if (inode->i_size > EXT3_I(inode)->i_disksize) {
+			EXT3_I(inode)->i_disksize = inode->i_size;
+			ret2 = ext3_mark_inode_dirty(handle, inode);
+			if (!ret)
+				ret = ret2;
+		}
 	}
 	ret2 = ext3_journal_stop(handle);
 	if (!ret)
diff -puN fs/fat/inode.c~fs-prepare_write-fixes fs/fat/inode.c
--- a/fs/fat/inode.c~fs-prepare_write-fixes
+++ a/fs/fat/inode.c
@@ -150,7 +150,11 @@ static int fat_commit_write(struct file 
 			    unsigned from, unsigned to)
 {
 	struct inode *inode = page->mapping->host;
-	int err = generic_commit_write(file, page, from, to);
+	int err;
+	if (to - from > 0)
+		return 0;
+
+	err = generic_commit_write(file, page, from, to);
 	if (!err && !(MSDOS_I(inode)->i_attrs & ATTR_ARCH)) {
 		inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC;
 		MSDOS_I(inode)->i_attrs |= ATTR_ARCH;
diff -puN fs/fuse/file.c~fs-prepare_write-fixes fs/fuse/file.c
--- a/fs/fuse/file.c~fs-prepare_write-fixes
+++ a/fs/fuse/file.c
@@ -467,6 +467,9 @@ static int fuse_commit_write(struct file
 	if (is_bad_inode(inode))
 		return -EIO;
 
+	if (to - from == 0)
+		return 0;
+
 	req = fuse_get_req(fc);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
diff -puN fs/hostfs/hostfs_kern.c~fs-prepare_write-fixes fs/hostfs/hostfs_kern.c
--- a/fs/hostfs/hostfs_kern.c~fs-prepare_write-fixes
+++ a/fs/hostfs/hostfs_kern.c
@@ -469,6 +469,11 @@ int hostfs_prepare_write(struct file *fi
 	long long start, tmp;
 	int err;
 
+	if (PageUptodate(page))
+		return 0;
+	if (to - from == PAGE_CACHE_SIZE)
+		return 0;
+
 	start = (long long) page->index << PAGE_CACHE_SHIFT;
 	buffer = kmap(page);
 	if(from != 0){
@@ -498,10 +503,14 @@ int hostfs_commit_write(struct file *fil
 	long long start;
 	int err = 0;
 
+	if (to - from == 0)
+		return 0;
+
 	start = (((long long) page->index) << PAGE_CACHE_SHIFT) + from;
 	buffer = kmap(page);
 	err = write_file(FILE_HOSTFS_I(file)->fd, &start, buffer + from,
 			 to - from);
+	kunmap(page);
 	if(err > 0) err = 0;
 
 	/* Actually, if !err, write_file has added to-from to start, so, despite
@@ -511,7 +520,9 @@ int hostfs_commit_write(struct file *fil
 	if(!err && (start > inode->i_size))
 		inode->i_size = start;
 
-	kunmap(page);
+	if (!PageUptodate(page) && to - from == PAGE_CACHE_SIZE)
+		SetPageUptodate(page);
+
 	return(err);
 }
 
diff -puN fs/jffs/inode-v23.c~fs-prepare_write-fixes fs/jffs/inode-v23.c
--- a/fs/jffs/inode-v23.c~fs-prepare_write-fixes
+++ a/fs/jffs/inode-v23.c
@@ -1531,7 +1531,7 @@ jffs_prepare_write(struct file *filp, st
 	/* FIXME: we should detect some error conditions here */
 
 	/* Bugger that. We should make sure the page is uptodate */
-	if (!PageUptodate(page) && (from || to < PAGE_CACHE_SIZE))
+	if (!PageUptodate(page) && (to - from < PAGE_CACHE_SIZE))
 		return jffs_do_readpage_nolock(filp, page);
 
 	return 0;
@@ -1541,11 +1541,18 @@ static int
 jffs_commit_write(struct file *filp, struct page *page,
                  unsigned from, unsigned to)
 {
+	int ret;
+
        void *addr = page_address(page) + from;
        /* XXX: PAGE_CACHE_SHIFT or PAGE_SHIFT */
        loff_t pos = page_offset(page) + from;
 
-       return jffs_file_write(filp, addr, to-from, &pos);
+       if (to - from == 0)
+       		return 0;
+
+       ret = jffs_file_write(filp, addr, to-from, &pos);
+       if (!PageUptodate(page) && (to - from == PAGE_CACHE_SIZE))
+	       SetPageUptodate(page);
 } /* jffs_commit_write() */
 
 /* This is our ioctl() routine.  */
diff -puN fs/jffs2/file.c~fs-prepare_write-fixes fs/jffs2/file.c
--- a/fs/jffs2/file.c~fs-prepare_write-fixes
+++ a/fs/jffs2/file.c
@@ -222,6 +222,9 @@ static int jffs2_commit_write (struct fi
 	D1(printk(KERN_DEBUG "jffs2_commit_write(): ino #%lu, page at 0x%lx, range %d-%d, flags %lx\n",
 		  inode->i_ino, pg->index << PAGE_CACHE_SHIFT, start, end, pg->flags));
 
+	if (end - start == 0)
+		return 0;
+
 	if (end == PAGE_CACHE_SIZE) {
 		if (!start) {
 			/* We need to avoid deadlock with page_cache_read() in
diff -puN fs/nfs/file.c~fs-prepare_write-fixes fs/nfs/file.c
--- a/fs/nfs/file.c~fs-prepare_write-fixes
+++ a/fs/nfs/file.c
@@ -299,6 +299,9 @@ static int nfs_commit_write(struct file 
 {
 	long status;
 
+	if (to - offset == 0)
+		return 0;
+
 	lock_kernel();
 	status = nfs_updatepage(file, page, offset, to-offset);
 	unlock_kernel();
diff -puN fs/smbfs/file.c~fs-prepare_write-fixes fs/smbfs/file.c
--- a/fs/smbfs/file.c~fs-prepare_write-fixes
+++ a/fs/smbfs/file.c
@@ -293,6 +293,10 @@ out:
 static int smb_prepare_write(struct file *file, struct page *page, 
 			     unsigned offset, unsigned to)
 {
+	if (!PageUptodate(page) && to - offset < PAGE_CACHE_SIZE) {
+		printk("SMB does not yet support partial page writes\n");
+		return -EIO;
+	}
 	return 0;
 }
 
@@ -301,10 +305,15 @@ static int smb_commit_write(struct file 
 {
 	int status;
 
+	if (to - offset == 0)
+		return 0;
+
 	status = -EFAULT;
 	lock_kernel();
 	status = smb_updatepage(file, page, offset, to-offset);
 	unlock_kernel();
+	if (!PageUptodate(page) && to - offset == PAGE_CACHE_SIZE)
+		SetPageUptodate(page);
 	return status;
 }
 
diff -puN fs/udf/file.c~fs-prepare_write-fixes fs/udf/file.c
--- a/fs/udf/file.c~fs-prepare_write-fixes
+++ a/fs/udf/file.c
@@ -75,6 +75,10 @@ static int udf_adinicb_writepage(struct 
 
 static int udf_adinicb_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to)
 {
+	if (!PageUptodate(page) && to - offset < PAGE_CACHE_SIZE) {
+		printk("UDF does not yet support partial page writes\n");
+		return -EIO;
+	}
 	kmap(page);
 	return 0;
 }
@@ -84,6 +88,9 @@ static int udf_adinicb_commit_write(stru
 	struct inode *inode = page->mapping->host;
 	char *kaddr = page_address(page);
 
+	if (to - offset == 0)
+		return 0;
+
 	memcpy(UDF_I_DATA(inode) + UDF_I_LENEATTR(inode) + offset,
 		kaddr + offset, to - offset);
 	mark_inode_dirty(inode);
_

Patches currently in -mm which might be from nickpiggin@xxxxxxxxxxxx are

origin.patch
oom-killer-meets-userspace-headers.patch
mm-fix-pagecache-write-deadlocks-xip.patch
fs-prepare_write-fixes.patch
memory-page-alloc-minor-cleanups.patch
memory-page-alloc-minor-cleanups-fix.patch
cpuset-remove-useless-sched-domain-line.patch
cpuset-explicit-dynamic-sched-domain-control-flags.patch
sched2-sched-domain-sysctl.patch

-
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Newbies FAQ]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux