> Hello, Hi Jan. > > On Fri 16-01-15 15:48:04, Namjae Jeon wrote: > > > > +int file_write_unfreeze(struct inode *inode) > > > > +{ > > > > + struct super_block *sb = inode->i_sb; > > > > + > > > > + if (!S_ISREG(inode->i_mode)) > > > > + return -EINVAL; > > > > + > > > > + spin_lock(&inode->i_lock); > > > > + > > > > + if (!(inode->i_state & I_WRITE_FREEZED)) { > > > > + spin_unlock(&inode->i_lock); > > > > + return -EINVAL; > > > > + } > > > > + > > > > + inode->i_state &= ~I_WRITE_FREEZED; > > > > + smp_wmb(); > > > > + wake_up(&sb->s_writers.wait_unfrozen); > > > > + spin_unlock(&inode->i_lock); > > > > + return 0; > > > > +} > > > > +EXPORT_SYMBOL(file_write_unfreeze); > > > So I was looking at the implementation and I have a few comments: > > > 1) The trick with freezing superblock looks nice but I'm somewhat worried > > > that if we wanted to heavily use per-inode freezing to defrag the whole > > > filesystem it may be too slow to freeze the whole fs, mark one inode as > > > frozen and then unfreeze the fs. But I guess we'll see that once have some > > > reasonably working implementation. > > Dmitry has given a good idea to avoid multiple freeze fs and unfreeze fs > > calls. > > > > ioctl(sb,FIFREEZE) > > while (f = pop(files_list)) > > ioctl(f,FS_IOC_FWFREEZE) > > ioctl(sb,FITHAW) > > > > In file_write_freeze, we could first check if the fs is already frozen, > > if yes than we can directly set inode write freeze state after taking > > relevant lock to prevent fs_thaw while the inode state is being set. > Well, doing fs-wide freezing from userspace makes sense as Dmitry pointed > out. We can then just fail FS_IOC_FWFREEZE with error when the whole fs isn't > frozen. I'm just somewhat worried whether the fs-wide freezing won't be too > fragile. E.g. consider a situation when you are running a defrag program > which is freezing and unfreezing the filesystem and then some background > work kicks which will want to snapshot the filesystem so it will freeze & > unfreeze the fs as well. Now depending on how exactly defrag and snapshot > race one of the FIFREEZE ioctls will return EBUSY and the process > (hopefully gracefully) fails. > > This isn't a new situation - if you ran two snapshots at once, you'd see > the same failure. But the more fs-wide freezing gets used in different > places the stranger and less expected failure you'll see... Yes, Right. Thanks for your opinion. I will consider your point. > > > > 2) The tests you are currently doing are racy. If > > > things happen as: > > > CPU1 CPU2 > > > inode_start_write() > > > file_write_freeze() > > > sb_start_pagefault() > > > Do modifications. > > > > > > Then you have a CPU modifying a file while file_write_freeze() has > > > succeeded so it should be frozen. > > > > > > If you swap inode_start_write() with sb_start_pagefault() the above race > > > doesn't happen but userspace program has to be really careful not to hit a > > > deadlock. E.g. if you tried to freeze two inodes the following could happen: > > > CPU1 CPU2 > > > file_write_freeze(inode1) > > > fault on inode1: > > > sb_start_pagefault() > > > inode_start_write() -> blocks > > > file_write_freeze(inode2) > > > blocks in freeze_super() > > > > > > So I don't think this is a good scheme for inode freezing... > > To solve this race, we can fold inode_start_write with sb_start_write and use > > similar appraoch of __sb_start_write. > > How about the below scheme ? > > > > void inode_start_write(struct inode *inode) > > { > > struct super_block *sb = inode->i_sb; > > > > retry: > > > > if (unlikely(inode->i_state & I_WRITE_FREEZED)) { > > DEFINE_WAIT(wait); > > > > prepare_to_wait(&sb->s_writers.wait_unfrozen, &wait, > > TASK_UNINTERRUPTIBLE); > > schedule(); > > finish_wait(&sb->s_writers.wait_unfrozen, &wait); > > > > goto retry; > > } > > > > sb_start_write(sb); > > > > /* check if file_write_freeze race with us */ > > if (unlikely(inode->i_state & I_WRITE_FREEZED) { > > sb_end_write(sb); > > goto retry; > > } > > } > Yes, this should work. You'll need a similar wrapper for page faults but > that's easy enough. Okay, Thanks :) > > Honza > > -- > Jan Kara <jack@xxxxxxx> > SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html