Re: [rfc][patch 3/4] fs: new truncate sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/08/2009 03:34 PM, Nick Piggin wrote:
> On Wed, Jul 08, 2009 at 06:47:01AM -0400, Christoph Hellwig wrote:
>> On Wed, Jul 08, 2009 at 08:32:25AM +0200, Nick Piggin wrote:
>>> Thanks for the patch, I think I will fold it in to the series. I
>>> think we probably do need to call simple_setsize in inode_setattr
>>> though (unless you propose to eventually convert every filesystem
>>> to define a .setattr). This would also require eg. your ext2
>>> conversion to strip ATTR_SIZE before passing through to inode_setattr.
>> Yes, we should eventually make .setattr mandatory.  Doing a default
>> action when a method lacks tends to cause more issues than it solves.
>>
>> I'm happy to help in doing that part of the conversion (and also other
>> bits)
> 
> OK well here is what I have now for 3/4 and 4/4. Basically just
> folded your patch on top, changed ordering of some checks, have
> fs clear ATTR_SIZE before calling inode_setattr, add a .new_truncate
> field to check against rather than .truncate, and provide a default
> ATTR_SIZE handler in inode_setattr (simple_setsize).
> 
> ---
> Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
> setattr > vmtruncate > truncate, have filesystems call their truncate sequence
> from ->setattr if filesystem specific operations are required. vmtruncate is
> deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
> previously should be used.
> 
> simple_setsize is also introduced to perform the equivalent of vmtruncate.
> simple_setsize gets called by inode_setattr when ATTR_SIZE is passed. So
> filesystems implementing their own truncate code in setattr then calling
> through to inode_setattr should clear ATTR_SIZE.
> 
> A new attribute is introduced into inode_operations structure; .new_truncate
> is a temporary hack to distinguish filesystems that implement the new
> truncate system. These guys cannot trim off block past i_size via vmtruncate,
> so instead they must handle it in fs code. This gives better opportunity to
> catch errors etc anyway. .new_truncate and .truncate will go away once all
> filesystems are converted.
> 
> Big problem with the previous calling sequence: the filesystem is not called
> until i_size has already changed.  This means it is not allowed to fail the
> call, and also it does not know what the previous i_size was. Also, generic
> code calling vmtruncate to truncate allocated blocks in case of error had
> no good way to return a meaningful error (or, for example, atomically handle
> block deallocation).
> 
> Signed-off-by: Nick Piggin <npiggin@xxxxxxx>
> ---
>  Documentation/filesystems/vfs.txt |    7 ++++++-
>  fs/attr.c                         |    7 ++++++-
>  fs/buffer.c                       |   12 +++++++++---
>  fs/direct-io.c                    |    7 ++++---
>  fs/libfs.c                        |   17 +++++++++++++++++
>  include/linux/fs.h                |    2 ++
>  mm/truncate.c                     |    6 ++----
>  7 files changed, 46 insertions(+), 12 deletions(-)
> 
> Index: linux-2.6/fs/libfs.c
> ===================================================================
> --- linux-2.6.orig/fs/libfs.c
> +++ linux-2.6/fs/libfs.c
> @@ -329,6 +329,22 @@ int simple_rename(struct inode *old_dir,
>  	return 0;
>  }
>  
> +int simple_setsize(struct inode *inode, loff_t newsize)
> +{
> +	loff_t oldsize;
> +	int error;
> +
> +	error = inode_newsize_ok(inode, newsize);
> +	if (error)
> +		return error;
> +
> +	oldsize = inode->i_size;
> +	i_size_write(inode, newsize);
> +	truncate_pagecache(inode, oldsize, newsize);
> +
> +	return error;
> +}
> +
>  int simple_readpage(struct file *file, struct page *page)
>  {
>  	clear_highpage(page);
> @@ -840,6 +856,7 @@ EXPORT_SYMBOL(generic_read_dir);
>  EXPORT_SYMBOL(get_sb_pseudo);
>  EXPORT_SYMBOL(simple_write_begin);
>  EXPORT_SYMBOL(simple_write_end);
> +EXPORT_SYMBOL(simple_setsize);
>  EXPORT_SYMBOL(simple_dir_inode_operations);
>  EXPORT_SYMBOL(simple_dir_operations);
>  EXPORT_SYMBOL(simple_empty);
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -1527,6 +1527,7 @@ struct inode_operations {
>  	void * (*follow_link) (struct dentry *, struct nameidata *);
>  	void (*put_link) (struct dentry *, struct nameidata *, void *);
>  	void (*truncate) (struct inode *);
> +	int new_truncate; /* nasty hack to transition to new truncate code */
>  	int (*permission) (struct inode *, int);
>  	int (*setattr) (struct dentry *, struct iattr *);
>  	int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
> @@ -2332,6 +2333,7 @@ extern int simple_link(struct dentry *,
>  extern int simple_unlink(struct inode *, struct dentry *);
>  extern int simple_rmdir(struct inode *, struct dentry *);
>  extern int simple_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
> +extern int simple_setsize(struct inode *inode, loff_t newsize);
>  extern int simple_sync_file(struct file *, struct dentry *, int);
>  extern int simple_empty(struct dentry *);
>  extern int simple_readpage(struct file *file, struct page *page);
> Index: linux-2.6/fs/buffer.c
> ===================================================================
> --- linux-2.6.orig/fs/buffer.c
> +++ linux-2.6/fs/buffer.c
> @@ -1992,9 +1992,14 @@ int block_write_begin(struct file *file,
>  			 * prepare_write() may have instantiated a few blocks
>  			 * outside i_size.  Trim these off again. Don't need
>  			 * i_size_read because we hold i_mutex.
> +			 *
> +			 * Filesystems which define i_op->new_truncate must
> +			 * handle this themselves. Eventually this will go
> +			 * away because everyone will be converted.
>  			 */
>  			if (pos + len > inode->i_size)
> -				vmtruncate(inode, inode->i_size);
> +				if (!inode->i_op->new_truncate)
> +					vmtruncate(inode, inode->i_size);
>  		}
>  	}
>  
> @@ -2371,7 +2376,7 @@ int block_commit_write(struct page *page
>   *
>   * We are not allowed to take the i_mutex here so we have to play games to
>   * protect against truncate races as the page could now be beyond EOF.  Because
> - * vmtruncate() writes the inode size before removing pages, once we have the
> + * truncate writes the inode size before removing pages, once we have the
>   * page lock we can determine safely if the page is beyond EOF. If it is not
>   * beyond EOF, then the page is guaranteed safe against truncation until we
>   * unlock the page.
> @@ -2595,7 +2600,8 @@ out_release:
>  	*pagep = NULL;
>  
>  	if (pos + len > inode->i_size)
> -		vmtruncate(inode, inode->i_size);
> +		if (!inode->i_op->new_truncate)
> +			vmtruncate(inode, inode->i_size);
>  
>  	return ret;
>  }
> Index: linux-2.6/fs/direct-io.c
> ===================================================================
> --- linux-2.6.orig/fs/direct-io.c
> +++ linux-2.6/fs/direct-io.c
> @@ -1210,14 +1210,15 @@ __blockdev_direct_IO(int rw, struct kioc
>  	/*
>  	 * In case of error extending write may have instantiated a few
>  	 * blocks outside i_size. Trim these off again for DIO_LOCKING.
> -	 * NOTE: DIO_NO_LOCK/DIO_OWN_LOCK callers have to handle this by
> -	 * it's own meaner.
> +	 * NOTE: DIO_NO_LOCK/DIO_OWN_LOCK callers have to handle this in
> +	 * their own manner.
>  	 */
>  	if (unlikely(retval < 0 && (rw & WRITE))) {
>  		loff_t isize = i_size_read(inode);
>  
>  		if (end > isize && dio_lock_type == DIO_LOCKING)
> -			vmtruncate(inode, isize);
> +			if (!inode->i_op->new_truncate)
> +				vmtruncate(inode, isize);
>  	}
>  
>  	if (rw == READ && dio_lock_type == DIO_LOCKING)
> Index: linux-2.6/fs/attr.c
> ===================================================================
> --- linux-2.6.orig/fs/attr.c
> +++ linux-2.6/fs/attr.c
> @@ -112,7 +112,12 @@ int inode_setattr(struct inode * inode,
>  
>  	if (ia_valid & ATTR_SIZE &&
>  	    attr->ia_size != i_size_read(inode)) {
> -		int error = vmtruncate(inode, attr->ia_size);
> +		int error;
> +
> +		if (inode->i_op->new_truncate)
> +			error = simple_setsize(inode, attr->ia_size);

I don't understand this branch.
If a filesystem has been converted to set "i_op->new_truncate=true"
then it must have been converted to intersect ->setattr and has set
the i_size (And needs to clear ATTR_SIZE, why?)

All other cases of systems not converted, or systems that do not have
->truncate will fall to the "else" part.

before the removal of i_op->new_truncate you will need to do something
with the systems that do not have ->truncate which will be a 
.setattr = simple_setattr or something

So I don't understand this conditional

> +		else
> +			error = vmtruncate(inode, attr->ia_size);
>  		if (error)
>  			return error;
>  	}
> Index: linux-2.6/Documentation/filesystems/vfs.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/filesystems/vfs.txt
> +++ linux-2.6/Documentation/filesystems/vfs.txt
> @@ -401,11 +401,16 @@ otherwise noted.
>    	started might not be in the page cache at the end of the
>    	walk).
>  
> -  truncate: called by the VFS to change the size of a file.  The
> +  truncate: Deprecated. This will not be called if ->setsize is defined.
> +	Called by the VFS to change the size of a file.  The
>   	i_size field of the inode is set to the desired size by the
>   	VFS before this method is called.  This method is called by
>   	the truncate(2) system call and related functionality.
>  
> +	Note: ->truncate and vmtruncate are deprecated. Do not add new
> +	instances/calls of these. Filesystems shoud be converted to do their
> +	truncate sequence via ->setattr().
> +
>    permission: called by the VFS to check for access rights on a POSIX-like
>    	filesystem.
>  
> Index: linux-2.6/mm/truncate.c
> ===================================================================
> --- linux-2.6.orig/mm/truncate.c
> +++ linux-2.6/mm/truncate.c
> @@ -513,12 +513,10 @@ int vmtruncate(struct inode * inode, lof
>  	loff_t oldsize;
>  	int error;
>  
> -	error = inode_newsize_ok(inode, offset);
> +	error = simple_setsize(inode, offset);
>  	if (error)
>  		return error;
> -	oldsize = inode->i_size;
> -	i_size_write(inode, offset);
> -	truncate_pagecache(inode, oldsize, offset);
> +
>  	if (inode->i_op->truncate)
>  		inode->i_op->truncate(inode);
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux