Re: [RFC PATCH v3 09/10] ovl: introduce helper of syncfs writeback inode waiting

Chengguang Xu <cgxu519@xxxxxxxxxxxx> · Mon, 09 Nov 2020 16:34:10 +0800

 ---- 在 星期一, 2020-11-09 15:07:18 Amir Goldstein <amir73il@xxxxxxxxx> 撰写 ----
 > On Mon, Nov 9, 2020 at 5:34 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote:
 > >
 > >  ---- 在 星期日, 2020-11-08 22:03:06 Chengguang Xu <cgxu519@xxxxxxxxxxxx> 撰写 ----
 > >  > Introduce a helper ovl_wait_wb_inodes() to wait until all
 > >  > target upper inodes finish writeback.
 > >  >
 > >  > Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxxxx>
 > >  > ---
 > >  >  fs/overlayfs/super.c | 30 ++++++++++++++++++++++++++++++
 > >  >  1 file changed, 30 insertions(+)
 > >  >
 > >  > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
 > >  > index e5607a908d82..9a535fc11221 100644
 > >  > --- a/fs/overlayfs/super.c
 > >  > +++ b/fs/overlayfs/super.c
 > >  > @@ -255,6 +255,36 @@ static void ovl_put_super(struct super_block *sb)
 > >  >      ovl_free_fs(ofs);
 > >  >  }
 > >  >
 > >  > +void ovl_wait_wb_inodes(struct ovl_fs *ofs)
 > >  > +{
 > >  > +    LIST_HEAD(tmp_list);
 > >  > +    struct ovl_inode *oi;
 > >  > +    struct inode *upper;
 > >  > +
 > >  > +    spin_lock(&ofs->syncfs_wait_list_lock);
 > >  > +    list_splice_init(&ofs->syncfs_wait_list, &tmp_list);
 > >  > +
 > >  > +    while (!list_empty(&tmp_list)) {
 > >  > +        oi = list_first_entry(&tmp_list, struct ovl_inode, wait_list);
 > >  > +        list_del_init(&oi->wait_list);
 > >  > +        ihold(&oi->vfs_inode);
 > >
 > > Maybe I overlooked race condition with inode eviction, so still need to introduce
 > > OVL_EVICT_PENDING flag just like we did in old syncfs efficiency patch series.
 > >
 > 
 > I am not sure why you added the ovl wait list.
 > 
 > I think you misunderstood Jan's suggestion.
 > I think what Jan meant is that ovl_sync_fs() should call
 > wait_sb_inodes(upper_sb)
 > to wait for writeback of ALL upper inodes after sync_filesystem()
 > started writeback
 > only on this ovl instance upper inodes.
 > 

Maybe you are right, the wait list is just for accuracy that can completely
avoid interferes between ovl instances, otherwise we may need to face
waiting interferes  in high density environment. 

 > I am not sure if this is acceptable or not - it is certainly an improvement over
 > current situation, but I have a feeling that on a large scale (many
 > containers) it
 > won't be enough.
 > 

The same as your thought. 

 > The idea was to keep it simple without over optimizing, since anyway
 > you are going for the "correct" solution long term (ovl inode aops),
 > so I wouldn't
 > add the wait list.
 > 

Maybe, I think it depends on how to implement ovl page-cache, so at current
stage I have no idea for the wait list.

 > As long as the upper inode is still dirty, we can keep the ovl inode in cache,
 > so the worst outcome is that drop_caches needs to get called twice before the
 > ovl inode can be evicted, no?
 > 

IIUC, since currently ovl does not have it's own page-cache, so there is no affect to page-cache reclaim, 
also  there is no ovl shrinker to reclaim slab because we drop ovl inode directly after final iput. 
So should we add a shrinker in this series?

Thanks,
Chengguang