Re: [RFC PATCH] fs: file freeze support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  Hello,

On Thu 15-01-15 20:36:55, Namjae Jeon wrote:
> We introduce per-file freeze feature for unifying defrag ext4 and xfs
> as first ingredient. We get the idea courtesy of Dave Chinner
> (https://lkml.org/lkml/2014/7/14/759)
> per-file freeze will be used to avoid that file is not modified while
> userspace is doing the defrag.
> 
> This patch tries to implement file write freeze functionality.
> Introduce new inode state, I_WRITE_FREEZED.
> When this state is set, any process which tries to modify the file's address
> space, either by pagefault mmap writes or using write(2), will block until
> the this state is cleared. I_WRITE_FREEZED is set by calling FS_IOC_FWFREEZE
> ioctl and clear by FS_IOC_FWTHAW ioctl.
> 
> File write freeze functionality, when used in conjunction with
> inode's immutable flag can be used for creating truly stable file snapshots
> wherein write freeze will prevent any modification to the file from already
> open file descriptors and immutable flag will prevent any new modification
> to the file. One of the intended uses for stable file snapshots would be in
> the defragmentation applications which defrags single file.
> 
> For implementation purpose, initially we tried to keep percpu usage counters
> inside struct inode just like there is struct sb_writers in super_block.
> But considering that it will significantly bloat up struct inode when actually
> the usage of file write freeze will be infrequent, we dropped this idea.
> Instead we have tried to use already present filesystem freezing infrastructure.
> Current approach makes it possible for implementing file write freeze without
> bloating any of struct super_block/inode.
> In FS_IOC_FWFREEZE, we wait for complete fs to be frozen, set I_WRITE_FREEZED to
> inode's state and unfreeze the fs. 
> 
> TODO : prevent inode eviction when I_WRITE_FREEZED state is set.
> TODO : xfstests testcase for file freeze.
> 
> Signed-off-by: Namjae Jeon <namjae.jeon@xxxxxxxxxxx>
> Signed-off-by: Ashish Sangwan <a.sangwan@xxxxxxxxxxx>
> ---
>  fs/btrfs/inode.c        |  1 +
>  fs/buffer.c             |  1 +
>  fs/ext4/inode.c         |  1 +
>  fs/f2fs/file.c          |  1 +
>  fs/gfs2/file.c          |  1 +
>  fs/inode.c              | 19 ++++++++++++++++++
>  fs/ioctl.c              | 30 +++++++++++++++++++++++++++++
>  fs/nilfs2/file.c        |  1 +
>  fs/ocfs2/mmap.c         |  1 +
>  fs/super.c              | 51 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h      | 13 +++++++++++++
>  include/uapi/linux/fs.h |  2 ++
>  mm/filemap.c            |  1 +
>  13 files changed, 123 insertions(+)
> 
...
> diff --git a/fs/inode.c b/fs/inode.c
> index aa149e7..82a96d0 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1929,3 +1929,22 @@ void inode_set_flags(struct inode *inode, unsigned int flags,
>  				  new_flags) != old_flags));
>  }
>  EXPORT_SYMBOL(inode_set_flags);
> +
> +void inode_start_write(struct inode *inode)
> +{
> +	struct super_block *sb = inode->i_sb;
> +
> +retry:
> +	spin_lock(&inode->i_lock);
> +	if (inode->i_state & I_WRITE_FREEZED) {
> +		DEFINE_WAIT(wait);
> +
> +		prepare_to_wait(&sb->s_writers.wait_unfrozen, &wait,
> +				TASK_UNINTERRUPTIBLE);
> +		spin_unlock(&inode->i_lock);
> +		schedule();
> +		finish_wait(&sb->s_writers.wait_unfrozen, &wait);
> +		goto retry;
> +	}
> +	spin_unlock(&inode->i_lock);
> +}
...
> diff --git a/fs/super.c b/fs/super.c
> index eae088f..5e44e42 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1393,3 +1393,54 @@ out:
>  	return 0;
>  }
>  EXPORT_SYMBOL(thaw_super);
> +
> +int file_write_freeze(struct inode *inode)
> +{
> +	struct super_block *sb = inode->i_sb;
> +	int ret = 0;
> +
> +	if (!S_ISREG(inode->i_mode))
> +		return -EINVAL;
> +
> +	spin_lock(&inode->i_lock);
> +
> +	if (inode->i_state & I_WRITE_FREEZED)
> +		ret = -EBUSY;
> +	spin_unlock(&inode->i_lock);
> +	if (ret)
> +		return ret;
> +
> +	ret = freeze_super(sb);
> +	if (ret)
> +		return ret;
> +
> +	spin_lock(&inode->i_lock);
> +	inode->i_state |= I_WRITE_FREEZED;
> +	smp_wmb();
> +	spin_unlock(&inode->i_lock);
> +
> +	return thaw_super(sb);
> +}
> +EXPORT_SYMBOL(file_write_freeze);
> +
> +int file_write_unfreeze(struct inode *inode)
> +{
> +	struct super_block *sb = inode->i_sb;
> +
> +	if (!S_ISREG(inode->i_mode))
> +		return -EINVAL;
> +
> +	spin_lock(&inode->i_lock);
> +
> +	if (!(inode->i_state & I_WRITE_FREEZED)) {
> +		spin_unlock(&inode->i_lock);
> +		return -EINVAL;
> +	}
> +
> +	inode->i_state &= ~I_WRITE_FREEZED;
> +	smp_wmb();
> +	wake_up(&sb->s_writers.wait_unfrozen);
> +	spin_unlock(&inode->i_lock);
> +	return 0;
> +}
> +EXPORT_SYMBOL(file_write_unfreeze);
  So I was looking at the implementation and I have a few comments:
1) The trick with freezing superblock looks nice but I'm somewhat worried
that if we wanted to heavily use per-inode freezing to defrag the whole
filesystem it may be too slow to freeze the whole fs, mark one inode as
frozen and then unfreeze the fs. But I guess we'll see that once have some
reasonably working implementation.

2) The tests you are currently doing are racy. If
things happen as:
  CPU1					CPU2
inode_start_write()
					file_write_freeze()
sb_start_pagefault()
Do modifications.

Then you have a CPU modifying a file while file_write_freeze() has
succeeded so it should be frozen.

If you swap inode_start_write() with sb_start_pagefault() the above race
doesn't happen but userspace program has to be really careful not to hit a
deadlock. E.g. if you tried to freeze two inodes the following could happen:
  CPU1					CPU2
					file_write_freeze(inode1)
fault on inode1:
sb_start_pagefault()
inode_start_write() -> blocks
					file_write_freeze(inode2)
					  blocks in freeze_super()

So I don't think this is a good scheme for inode freezing...

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux