Re: [PATCH v4] ovl: Improving syncfs efficiency

Chengguang Xu <cgxu519@xxxxxxxxxx> · Mon, 15 Jan 2018 20:35:11 +0800

> 
> 在 2018年1月15日，下午8:27，Amir Goldstein <amir73il@xxxxxxxxx> 写道：
> 
> On Mon, Jan 15, 2018 at 2:19 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>> 
>>> 在 2018年1月15日，下午7:30，Amir Goldstein <amir73il@xxxxxxxxx> 写道：
>>> 
>>> On Mon, Jan 15, 2018 at 9:33 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem()
>>>> on upper_sb to synchronize whole dirty inodes in upper filesystem
>>>> regardless of the overlay ownership of the inode. In the use case of
>>>> container, when multiple containers using the same underlying upper
>>>> filesystem, it has some shortcomings as below.
>>>> 
>>>> (1) Performance
>>>> Synchronization is probably heavy because it actually syncs unnecessary
>>>> inodes for target overlayfs.
>>>> 
>>>> (2) Interference
>>>> Unplanned synchronization will probably impact IO performance of
>>>> irrelative container processes on the other overlayfs.
>>> 
>>> s/irrelative/unrelated
>>> 
>>>> 
>>>> This patch iterates upper inode list in overlayfs to only sync target
>>>> dirty inodes and wait for completion. By doing this, It is able to reduce
>>>> cost of synchronization and will not seriously impact IO performance of
>>>> irrelative processes. In special case, when having very few dirty inodes
>>> 
>>> unrelated processes
>>> 
>>>> and a lot of clean upper inodes in overlayfs, then iteration may waste
>>>> time so that synchronization is slower than before, but we think the
>>>> potential for performance improvement far surpass the potential for
>>>> performance regression in most cases.
>>>> 
>>>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx>
>>> 
>>> Beyond Miklos's comment on removing inodes from list, a few more
>>> comments and nits below.
>>> 
>>>> ---
> [...]
>>>> +/**
>>>> + * ovl_sync_filesystem
>>>> + * @sb: The overlayfs super block
>>>> + *
>>>> + * Sync underlying dirty inodes in upper filesystem and wait for completion.
>>>> + */
>>>> +static int ovl_sync_filesystem(struct super_block *sb)
>>>> +{
>>>> +       struct ovl_fs *ofs = sb->s_fs_info;
>>>> +       struct super_block *upper_sb = ofs->upper_mnt->mnt_sb;
>>>> +
>>>> +       ovl_sync_inodes(sb);
>>>> +       ovl_wait_inodes(sb);
>>>> +
>>>> +       if (upper_sb->s_op->sync_fs)
>>>> +               upper_sb->s_op->sync_fs(upper_sb, 1);
>>>> +
>>>> +       return sync_blockdev(upper_sb->s_bdev);
>>>> +}
>>>> +
>>>> /* Sync real dirty inodes in upper filesystem (if it exists) */
>>>> static int ovl_sync_fs(struct super_block *sb, int wait)
>>>> {
>>>> @@ -266,17 +422,19 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
>>>>        * If this is a sync(2) call or an emergency sync, all the super blocks
>>>>        * will be iterated, including upper_sb, so no need to do anything.
>>>>        *
>>>> -        * If this is a syncfs(2) call, then we do need to call
>>>> -        * sync_filesystem() on upper_sb, but enough if we do it when being
>>>> +        * If this is a syncfs(2) call, then we do need to call sync_inode()
>>>> +        * on underlying dirty upper_inode, but enough if we do it when being
>>>>        * called with wait == 1.
>>>>        */
>>>>       if (!wait)
>>>>               return 0;
>>>> 
>>>>       upper_sb = ofs->upper_mnt->mnt_sb;
>>>> +       if (sb_rdonly(upper_sb))
>>>> +               return 0;
>>>> 
>>>>       down_read(&upper_sb->s_umount);
>>>> -       ret = sync_filesystem(upper_sb);
>>>> +       ret = ovl_sync_filesystem(sb);
>>>>       up_read(&upper_sb->s_umount);
>>> 
>>> The boundaries between ovl_sync_fs() and ovl_sync_filesystem() are now very
>>> strange.. taking upper_sb s_umount  lock outside and passing in sb etc.
>>> I am not sure that ovl_sync/wait_inodes() should be called under upper_sb
>>> s_umount lock at all. Remember we are already holding sb s_umount lock.
>>> I suggest, now that ovl_sync/wait_inodes() have been factored out, to squash
>>> ovl_sync_filesystem() back into ovl_sync_fs() and hold the s_umount lock
>>> only around s_op->sync_fs() and sync_blockdev().
>> 
>> From the comment in sync_filesystem, s_umount is used for against the filesystem going
>> from r/o to r/w or vice versa. so I think it is necessary and should move readonly
>> check inside ovl_sync_filesystem() to get the protection.
>> 
> 
> Right, but the interface between the 2 functions looks weird, so
> please either get rid of
> ovl_sync_filesystem() entirely and move its code into ovl_sync_fs()
> under the s_umount
> lock where needed or move the s_umount lock and sb_rdonly() test into
> ovl_sync_filesystem(sb), which leaves ovl_sync_fs() pretty slim.

OKAY, I take the latter option.

Thanks,
Chengguang.

--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html