On Mon, Nov 27, 2017 at 12:31 PM, cgxu <cgxu@xxxxxxxxxxxx> wrote: > > >> 在 2017年11月27日,下午4:59,Amir Goldstein <amir73il@xxxxxxxxx> 写道: >> >> On Mon, Nov 27, 2017 at 10:08 AM, cgxu <cgxu@xxxxxxxxxxxx> wrote: >>> >>> Hi, >>> >>> Recently I found syncfs and umount could not make data consistency in overlayfs. >>> I did some investigations and now I know synchronization didn’t directly happen >>> in upperdir filesystem during these operations executed in overlayfs. >>> >> >> Can you share the method of your investigation? >> Maybe make it into an xfstest? >> You can probably use dmflakey target. > > Testing method as below. (I tested in virtual machine) > > 1. stop bd_writeback behavior by adjusting parameters in /proc/sys/vm/ > 2. copy a large file to overlayfs directory and then do syncfs or umount. > 3. power off && power on > 4. compare content of files using diff. > > and I also applied below simple quick patch (may need more check for detail) to check fixing result. > > ----------- > $ git diff > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > index 7b64d5c..3ab25a7 100644 > --- a/fs/overlayfs/super.c > +++ b/fs/overlayfs/super.c > @@ -1222,11 +1222,24 @@ static struct dentry *ovl_mount(struct file_system_type *fs_type, int flags, > return mount_nodev(fs_type, flags, raw_data, ovl_fill_super); > } > > +static void ovl_kill_super(struct super_block *sb) > +{ > + struct ovl_fs *ufs = sb->s_fs_info; > + struct super_block *real_super = ufs->upper_mnt->mnt_sb; > + > + if (!(sb->s_flags & MS_RDONLY)) { > + down_read(&real_super->s_umount); > + sync_filesystem(real_super); > + up_read(&real_super->s_umount); > + } > + kill_anon_super(sb); > +} > + kill_anon_super() -> generic_shutdown_super() -> sync_filesystem() So your implementation should go into ovl_sync_fs instead to cover other cases like freeze_super() and plain syncfs(). > static struct file_system_type ovl_fs_type = { > .owner = THIS_MODULE, > .name = "overlay", > .mount = ovl_mount, > - .kill_sb = kill_anon_super, > + .kill_sb = ovl_kill_super, > }; > MODULE_ALIAS_FS("overlay"); > > diff --git a/fs/sync.c b/fs/sync.c > index 83ac79a..d49792c 100644 > --- a/fs/sync.c > +++ b/fs/sync.c > @@ -161,6 +161,8 @@ void emergency_sync(void) > if (!f.file) > return -EBADF; > sb = f.file->f_path.dentry->d_sb; > + if (!(sb->s_flags & MS_RDONLY)) > + sb = d_real(f.file->f_path.dentry, NULL, f.file->f_flags, 0)->d_sb; > No reason to change vfs. sync_filesystem() calls ovl_sync_fs() where upper fs can be synced. > down_read(&sb->s_umount); > ret = sync_filesystem(sb); > > > ----------- > > >> >>> I'm also confused by below comment in function ovl_sync_fs() in super.c >>> >>> /* real inodes have already been synced by sync_filesystem(ovl_sb) */ >>> >>> In my understanding sync_filesystem(ovl_sb) can not make real inodes synced >>> unless delivers real superblock of upperdir filesystem to sync_filesystem(). >>> >>> I can’t make sure current implementation is by design or for some other reasons because >>> sync upperdir filesystem may be a little bit heavy especially in a large filesystem. >>> Could anyone give me a hint for this? >>> >> >> Looks like you are right. >> That comment I wrote is an oversight, not by design, although the concern >> that you raise is valid. >> >>> If this problem needs to be fix, I’ll try to make a patch for review. >>> >> >> So the simple way would be to call __sync_filesystem() from ovl_sync_fs() >> instead of calling upper_sb->s_op->sync_fs(), but that would be heavy >> as you said. >> A better fix would be to iterate overlay sb inodes and flush only the dirty >> ovl_inode_upper() pages, before calling upper_sb->s_op->sync_fs(). >> >> I didn't look closely to see how complex that would be, so let me know what >> you think is best. > > The easiest way is as you said or like quick patch above, > but sync_fs() seems like doing superblock related things only in most case. > It does everything that generic vfs sync_filesystem() doesn't do. In overlayfs case, that is syncing upper fs. > For me maybe ideal way is define new virtual BDI for overlayfs, > and using that to organize dirty inodes, so that we don’t have to put much > effort to looking for target inodes. If this approach makes sense I’ll > carefully consider detail. > > We can just fix this problem in simple way first and consider > better solution later because that is probably related to many complex things. > Agree. Safety first! > I think for most people umount or sync_fs takes time is acceptable and at least > it is much better than data inconsistency. > Not sure about that. I heard there are use cases for containers for which the time of mount/umount matters. We'll have to see about that. Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html