Re: sync filesystem of overlayfs

Amir Goldstein <amir73il@xxxxxxxxx> · Mon, 27 Nov 2017 13:30:35 +0200

On Mon, Nov 27, 2017 at 12:31 PM, cgxu <cgxu@xxxxxxxxxxxx> wrote:
>
>
>> 在 2017年11月27日，下午4:59，Amir Goldstein <amir73il@xxxxxxxxx> 写道：
>>
>> On Mon, Nov 27, 2017 at 10:08 AM, cgxu <cgxu@xxxxxxxxxxxx> wrote:
>>>
>>> Hi,
>>>
>>> Recently I found syncfs and umount could not make data consistency in overlayfs.
>>> I did some investigations and now I know synchronization didn’t directly happen
>>> in upperdir filesystem during these operations executed in overlayfs.
>>>
>>
>> Can you share the method of your investigation?
>> Maybe make it into an xfstest?
>> You can probably use dmflakey target.
>
> Testing method as below. (I tested in virtual machine)
>
> 1. stop bd_writeback behavior by adjusting parameters in /proc/sys/vm/
> 2. copy a large file to overlayfs directory and then do syncfs or umount.
> 3. power off && power on
> 4. compare content of files using diff.
>
> and I also applied below simple quick patch (may need more check for detail) to check fixing result.
>
> -----------
> $ git diff
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 7b64d5c..3ab25a7 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -1222,11 +1222,24 @@ static struct dentry *ovl_mount(struct file_system_type *fs_type, int flags,
>         return mount_nodev(fs_type, flags, raw_data, ovl_fill_super);
>  }
>
> +static void ovl_kill_super(struct super_block *sb)
> +{
> +        struct ovl_fs *ufs = sb->s_fs_info;
> +       struct super_block *real_super = ufs->upper_mnt->mnt_sb;
> +
> +        if (!(sb->s_flags & MS_RDONLY)) {
> +                down_read(&real_super->s_umount);
> +                sync_filesystem(real_super);
> +                up_read(&real_super->s_umount);
> +        }
> +        kill_anon_super(sb);
> +}
> +

kill_anon_super() -> generic_shutdown_super() -> sync_filesystem()

So your implementation should go into ovl_sync_fs instead to cover
other cases like freeze_super() and plain syncfs().

>  static struct file_system_type ovl_fs_type = {
>         .owner          = THIS_MODULE,
>         .name           = "overlay",
>         .mount          = ovl_mount,
> -       .kill_sb        = kill_anon_super,
> +       .kill_sb        = ovl_kill_super,
>  };
>  MODULE_ALIAS_FS("overlay");
>
> diff --git a/fs/sync.c b/fs/sync.c
> index 83ac79a..d49792c 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -161,6 +161,8 @@ void emergency_sync(void)
>         if (!f.file)
>                 return -EBADF;
>         sb = f.file->f_path.dentry->d_sb;
> +       if (!(sb->s_flags & MS_RDONLY))
> +               sb = d_real(f.file->f_path.dentry, NULL, f.file->f_flags, 0)->d_sb;
>

No reason to change vfs.
sync_filesystem() calls ovl_sync_fs() where upper fs can be synced.

>         down_read(&sb->s_umount);
>         ret = sync_filesystem(sb);
>
>
> -----------
>
>
>>
>>> I'm also confused by below comment in function ovl_sync_fs() in super.c
>>>
>>> /* real inodes have already been synced by sync_filesystem(ovl_sb) */
>>>
>>> In my understanding sync_filesystem(ovl_sb) can not make real inodes synced
>>> unless delivers real superblock of upperdir filesystem to sync_filesystem().
>>>
>>> I can’t make sure current implementation is by design or for some other reasons because
>>> sync upperdir filesystem may be a little bit heavy especially in a large filesystem.
>>> Could anyone give me a hint for this？
>>>
>>
>> Looks like you are right.
>> That comment I wrote is an oversight, not by design, although the concern
>> that you raise is valid.
>>
>>> If this problem needs to be fix, I’ll try to make a patch for review.
>>>
>>
>> So the simple way would be to call __sync_filesystem() from ovl_sync_fs()
>> instead of calling upper_sb->s_op->sync_fs(), but that would be heavy
>> as you said.
>> A better fix would be to iterate overlay sb inodes and flush only the dirty
>> ovl_inode_upper() pages, before calling upper_sb->s_op->sync_fs().
>>
>> I didn't look closely to see how complex that would be, so let me know what
>> you think is best.
>
> The easiest way is as you said or like quick patch above,
> but sync_fs() seems like doing superblock related things only in most case.
>

It does everything that generic vfs sync_filesystem() doesn't do.
In overlayfs case, that is syncing upper fs.

> For me maybe ideal way is define new virtual BDI for overlayfs,
> and using that to organize dirty inodes, so that we don’t have to put much
> effort to looking for target inodes. If this approach makes sense I’ll
> carefully consider detail.
>
> We can just fix this problem in simple way first and consider
> better solution later because that is probably related to many complex things.
>

Agree. Safety first!

> I think for most people umount or sync_fs takes time is acceptable and at least
> it is much better than data inconsistency.
>

Not sure about that. I heard there are use cases for containers for
which the time
of mount/umount matters.
We'll have to see about that.

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html