Re: sync filesystem of overlayfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> 在 2017年11月27日,下午4:59,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
> 
> On Mon, Nov 27, 2017 at 10:08 AM, cgxu <cgxu@xxxxxxxxxxxx> wrote:
>> 
>> Hi,
>> 
>> Recently I found syncfs and umount could not make data consistency in overlayfs.
>> I did some investigations and now I know synchronization didn’t directly happen
>> in upperdir filesystem during these operations executed in overlayfs.
>> 
> 
> Can you share the method of your investigation?
> Maybe make it into an xfstest?
> You can probably use dmflakey target.

Testing method as below. (I tested in virtual machine)

1. stop bd_writeback behavior by adjusting parameters in /proc/sys/vm/
2. copy a large file to overlayfs directory and then do syncfs or umount.
3. power off && power on
4. compare content of files using diff.

and I also applied below simple quick patch (may need more check for detail) to check fixing result.

-----------
$ git diff
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 7b64d5c..3ab25a7 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1222,11 +1222,24 @@ static struct dentry *ovl_mount(struct file_system_type *fs_type, int flags,
        return mount_nodev(fs_type, flags, raw_data, ovl_fill_super);
 }

+static void ovl_kill_super(struct super_block *sb)
+{
+        struct ovl_fs *ufs = sb->s_fs_info;
+       struct super_block *real_super = ufs->upper_mnt->mnt_sb;
+
+        if (!(sb->s_flags & MS_RDONLY)) {
+                down_read(&real_super->s_umount);
+                sync_filesystem(real_super);
+                up_read(&real_super->s_umount);
+        }
+        kill_anon_super(sb);
+}
+
 static struct file_system_type ovl_fs_type = {
        .owner          = THIS_MODULE,
        .name           = "overlay",
        .mount          = ovl_mount,
-       .kill_sb        = kill_anon_super,
+       .kill_sb        = ovl_kill_super,
 };
 MODULE_ALIAS_FS("overlay");

diff --git a/fs/sync.c b/fs/sync.c
index 83ac79a..d49792c 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -161,6 +161,8 @@ void emergency_sync(void)
        if (!f.file)
                return -EBADF;
        sb = f.file->f_path.dentry->d_sb;
+       if (!(sb->s_flags & MS_RDONLY))
+               sb = d_real(f.file->f_path.dentry, NULL, f.file->f_flags, 0)->d_sb;

        down_read(&sb->s_umount);
        ret = sync_filesystem(sb);


-----------


> 
>> I'm also confused by below comment in function ovl_sync_fs() in super.c
>> 
>> /* real inodes have already been synced by sync_filesystem(ovl_sb) */
>> 
>> In my understanding sync_filesystem(ovl_sb) can not make real inodes synced
>> unless delivers real superblock of upperdir filesystem to sync_filesystem().
>> 
>> I can’t make sure current implementation is by design or for some other reasons because
>> sync upperdir filesystem may be a little bit heavy especially in a large filesystem.
>> Could anyone give me a hint for this?
>> 
> 
> Looks like you are right.
> That comment I wrote is an oversight, not by design, although the concern
> that you raise is valid.
> 
>> If this problem needs to be fix, I’ll try to make a patch for review.
>> 
> 
> So the simple way would be to call __sync_filesystem() from ovl_sync_fs()
> instead of calling upper_sb->s_op->sync_fs(), but that would be heavy
> as you said.
> A better fix would be to iterate overlay sb inodes and flush only the dirty
> ovl_inode_upper() pages, before calling upper_sb->s_op->sync_fs().
> 
> I didn't look closely to see how complex that would be, so let me know what
> you think is best.

The easiest way is as you said or like quick patch above, 
but sync_fs() seems like doing superblock related things only in most case.

For me maybe ideal way is define new virtual BDI for overlayfs,
and using that to organize dirty inodes, so that we don’t have to put much
effort to looking for target inodes. If this approach makes sense I’ll
carefully consider detail.

We can just fix this problem in simple way first and consider
better solution later because that is probably related to many complex things.

I think for most people umount or sync_fs takes time is acceptable and at least
it is much better than data inconsistency.



> 
> Thanks,
> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Best Regards,
Chengguang


--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux