On Mon, Nov 9, 2020 at 10:34 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > ---- 在 星期一, 2020-11-09 15:07:18 Amir Goldstein <amir73il@xxxxxxxxx> 撰写 ---- > > On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > > ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@xxxxxxxxxxxx> 撰写 ---- > > > > Introduce a helper ovl_wait_wb_inodes() to wait until all > > > > target upper inodes finish writeback. > > > > > > > > Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxxxx> > > > > --- > > > > fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++ > > > > 1 file changed, 30 insertions(+) > > > > > > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > > > > index e5607a908d82..9a535fc11221 100644 > > > > --- a/fs/overlayfs/super.c > > > > +++ b/fs/overlayfs/super.c > > > > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb) > > > > ovl_free_fs(ofs); > > > > } > > > > > > > > +void ovl_wait_wb_inodes(struct ovl_fs *ofs) > > > > +{ > > > > + LIST_HEAD(tmp_list); > > > > + struct ovl_inode *oi; > > > > + struct inode *upper; > > > > + > > > > + spin_lock(&ofs->syncfs_wait_list_lock); > > > > + list_splice_init(&ofs->syncfs_wait_list, &tmp_list); > > > > + > > > > + while (!list_empty(&tmp_list)) { > > > > + oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list); > > > > + list_del_init(&oi->wait_list); > > > > + ihold(&oi->vfs_inode); > > > > > > Maybe I overlooked race condition with inode eviction, so still need to introduce > > > OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series. > > > > > > > I am not sure why you added the ovl wait list. > > > > I think you misunderstood Jan's suggestion. > > I think what Jan meant is that ovl_sync_fs() should call > > wait_sb_inodes(upper_sb) > > to wait for writeback of ALL upper inodes after sync_filesystem() > > started writeback > > only on this ovl instance upper inodes. > > > > > Maybe you are right, the wait list is just for accuracy that can completely > avoid interferes between ovl instances, otherwise we may need to face > waiting interferes in high density environment. > > > > I am not sure if this is acceptable or not - it is certainly an improvement over > > current situation, but I have a feeling that on a large scale (many > > containers) it > > won't be enough. > > > > The same as your thought. > > > > The idea was to keep it simple without over optimizing, since anyway > > you are going for the "correct" solution long term (ovl inode aops), > > so I wouldn't > > add the wait list. > > > > Maybe, I think it depends on how to implement ovl page-cache, so at current > stage I have no idea for the wait list. > > > > As long as the upper inode is still dirty, we can keep the ovl inode in cache, > > so the worst outcome is that drop_caches needs to get called twice before the > > ovl inode can be evicted, no? > > > > IIUC, since currently ovl does not have it's own page-cache, so there is no affect to page-cache reclaim, > also there is no ovl shrinker to reclaim slab because we drop ovl inode directly after final iput. > So should we add a shrinker in this series? > Would that add a lot of complexity? Thinking out loud: maybe we follow Jan's suggestion and fix remaining performance with followup series? Thanks, Amir.