---- 在 星期一, 2021-11-22 11:00:31 Chengguang Xu <cgxu519@xxxxxxxxxxxx> 撰写 ---- > From: Chengguang Xu <charliecgxu@xxxxxxxxxxx> > > Current syncfs(2) syscall on overlayfs just calls sync_filesystem() > on upper_sb to synchronize whole dirty inodes in upper filesystem > regardless of the overlay ownership of the inode. In the use case of > container, when multiple containers using the same underlying upper > filesystem, it has some shortcomings as below. > > (1) Performance > Synchronization is probably heavy because it actually syncs unnecessary > inodes for target overlayfs. > > (2) Interference > Unplanned synchronization will probably impact IO performance of > unrelated container processes on the other overlayfs. > > This series try to implement containerized syncfs for overlayfs so that > only sync target dirty upper inodes which are belong to specific overlayfs > instance. By doing this, it is able to reduce cost of synchronization and > will not seriously impact IO performance of unrelated processes. > > v1->v2: > - Mark overlayfs' inode dirty itself instead of adding notification > mechanism to vfs inode. > > v2->v3: > - Introduce overlayfs' extra syncfs wait list to wait target upper inodes > in ->sync_fs. > > v3->v4: > - Using wait_sb_inodes() to wait syncing upper inodes. > - Mark overlay inode dirty only when having upper inode and VM_SHARED > flag in ovl_mmap(). > - Check upper i_state after checking upper mmap state > in ovl_write_inode. > > v4->v5: > - Add underlying inode dirtiness check after mnt_drop_write(). > - Handle both wait/no-wait mode of syncfs(2) in overlayfs' ->sync_fs(). > > v5->v6: > - Rebase to latest overlayfs-next tree. > - Mark oerlay inode dirty when it has upper instead of marking dirty on > modification. > - Trigger dirty page writeback in overlayfs' ->write_inode(). > - Mark overlay inode 'DONTCACHE' flag. > - Delete overlayfs' ->writepages() and ->evict_inode() operations. Hi Miklos, Have you got time to have a look at this V6 series? I think this version has already fixed the issues in previous feedbacks of you guys and passed fstests (generic/overlay cases). I did some stress long time tests (tar & syncfs & diff on w/wo copy-up) and found no obvious problem. For syncfs time with 1M clean upper inodes, there was extra 1.3s wasted on waiting scheduling. I guess this 1.3s will not bring significant impact to container instance in most cases, I also agree with Jack that we can start with this approach and do some improvements afterwards if there is complain from any real users. Thanks, Chengguang > > Chengguang Xu (7): > ovl: setup overlayfs' private bdi > ovl: mark overlayfs inode dirty when it has upper > ovl: implement overlayfs' own ->write_inode operation > ovl: set 'DONTCACHE' flag for overlayfs inode > fs: export wait_sb_inodes() > ovl: introduce ovl_sync_upper_blockdev() > ovl: implement containerized syncfs for overlayfs > > fs/fs-writeback.c | 3 ++- > fs/overlayfs/inode.c | 5 +++- > fs/overlayfs/super.c | 49 ++++++++++++++++++++++++++++++++------- > fs/overlayfs/util.c | 1 + > include/linux/writeback.h | 1 + > 5 files changed, 48 insertions(+), 11 deletions(-) > > -- > 2.27.0 > >