On Mon, Jan 15, 2018 at 2:19 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >> >> 在 2018年1月15日,下午7:30,Amir Goldstein <amir73il@xxxxxxxxx> 写道: >> >> On Mon, Jan 15, 2018 at 9:33 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem() >>> on upper_sb to synchronize whole dirty inodes in upper filesystem >>> regardless of the overlay ownership of the inode. In the use case of >>> container, when multiple containers using the same underlying upper >>> filesystem, it has some shortcomings as below. >>> >>> (1) Performance >>> Synchronization is probably heavy because it actually syncs unnecessary >>> inodes for target overlayfs. >>> >>> (2) Interference >>> Unplanned synchronization will probably impact IO performance of >>> irrelative container processes on the other overlayfs. >> >> s/irrelative/unrelated >> >>> >>> This patch iterates upper inode list in overlayfs to only sync target >>> dirty inodes and wait for completion. By doing this, It is able to reduce >>> cost of synchronization and will not seriously impact IO performance of >>> irrelative processes. In special case, when having very few dirty inodes >> >> unrelated processes >> >>> and a lot of clean upper inodes in overlayfs, then iteration may waste >>> time so that synchronization is slower than before, but we think the >>> potential for performance improvement far surpass the potential for >>> performance regression in most cases. >>> >>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx> >> >> Beyond Miklos's comment on removing inodes from list, a few more >> comments and nits below. >> >>> --- [...] >>> +/** >>> + * ovl_sync_filesystem >>> + * @sb: The overlayfs super block >>> + * >>> + * Sync underlying dirty inodes in upper filesystem and wait for completion. >>> + */ >>> +static int ovl_sync_filesystem(struct super_block *sb) >>> +{ >>> + struct ovl_fs *ofs = sb->s_fs_info; >>> + struct super_block *upper_sb = ofs->upper_mnt->mnt_sb; >>> + >>> + ovl_sync_inodes(sb); >>> + ovl_wait_inodes(sb); >>> + >>> + if (upper_sb->s_op->sync_fs) >>> + upper_sb->s_op->sync_fs(upper_sb, 1); >>> + >>> + return sync_blockdev(upper_sb->s_bdev); >>> +} >>> + >>> /* Sync real dirty inodes in upper filesystem (if it exists) */ >>> static int ovl_sync_fs(struct super_block *sb, int wait) >>> { >>> @@ -266,17 +422,19 @@ static int ovl_sync_fs(struct super_block *sb, int wait) >>> * If this is a sync(2) call or an emergency sync, all the super blocks >>> * will be iterated, including upper_sb, so no need to do anything. >>> * >>> - * If this is a syncfs(2) call, then we do need to call >>> - * sync_filesystem() on upper_sb, but enough if we do it when being >>> + * If this is a syncfs(2) call, then we do need to call sync_inode() >>> + * on underlying dirty upper_inode, but enough if we do it when being >>> * called with wait == 1. >>> */ >>> if (!wait) >>> return 0; >>> >>> upper_sb = ofs->upper_mnt->mnt_sb; >>> + if (sb_rdonly(upper_sb)) >>> + return 0; >>> >>> down_read(&upper_sb->s_umount); >>> - ret = sync_filesystem(upper_sb); >>> + ret = ovl_sync_filesystem(sb); >>> up_read(&upper_sb->s_umount); >> >> The boundaries between ovl_sync_fs() and ovl_sync_filesystem() are now very >> strange.. taking upper_sb s_umount lock outside and passing in sb etc. >> I am not sure that ovl_sync/wait_inodes() should be called under upper_sb >> s_umount lock at all. Remember we are already holding sb s_umount lock. >> I suggest, now that ovl_sync/wait_inodes() have been factored out, to squash >> ovl_sync_filesystem() back into ovl_sync_fs() and hold the s_umount lock >> only around s_op->sync_fs() and sync_blockdev(). > > From the comment in sync_filesystem, s_umount is used for against the filesystem going > from r/o to r/w or vice versa. so I think it is necessary and should move readonly > check inside ovl_sync_filesystem() to get the protection. > Right, but the interface between the 2 functions looks weird, so please either get rid of ovl_sync_filesystem() entirely and move its code into ovl_sync_fs() under the s_umount lock where needed or move the s_umount lock and sb_rdonly() test into ovl_sync_filesystem(sb), which leaves ovl_sync_fs() pretty slim. Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html