---- 在 星期四, 2020-04-16 19:14:24 Jan Kara <jack@xxxxxxx> 撰写 ---- > On Thu 16-04-20 14:08:59, Chengguang Xu wrote: > > ---- 在 星期四, 2020-04-16 03:19:50 Miklos Szeredi <miklos@xxxxxxxxxx> 撰写 ---- > > > On Mon, Feb 10, 2020 at 4:10 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > +void ovl_evict_inode(struct inode *inode) > > > > +{ > > > > + struct ovl_inode *oi = OVL_I(inode); > > > > + struct ovl_write_inode_work ovl_wiw; > > > > + DEFINE_WAIT_BIT(wait, &oi->flags, OVL_WRITE_INODE_PENDING); > > > > + wait_queue_head_t *wqh; > > > > + > > > > + if (ovl_inode_upper(inode)) { > > > > + if (current->flags & PF_MEMALLOC) { > > > > + spin_lock(&inode->i_lock); > > > > + ovl_set_flag(OVL_WRITE_INODE_PENDING, inode); > > > > + wqh = bit_waitqueue(&oi->flags, > > > > + OVL_WRITE_INODE_PENDING); > > > > + prepare_to_wait(wqh, &wait.wq_entry, > > > > + TASK_UNINTERRUPTIBLE); > > > > + spin_unlock(&inode->i_lock); > > > > + > > > > + ovl_wiw.inode = inode; > > > > + INIT_WORK(&ovl_wiw.work, ovl_write_inode_work_fn); > > > > + schedule_work(&ovl_wiw.work); > > > > + > > > > + schedule(); > > > > + finish_wait(wqh, &wait.wq_entry); > > > > > > What is the reason to do this in another thread if this is a PF_MEMALLOC task? > > > > Some underlying filesystems(for example ext4) check the flag in > > ->write_inode() and treate it as an abnormal case.(warn and return) > > > > ext4_write_inode(): > > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC) || > > sb_rdonly(inode->i_sb)) > > return 0; > > > > overlayfs inodes are always keeping clean even after wring/modifying > > upperfile , so they are right target of kswapd but in the point of lower > > layer, ext4 just thinks kswapd is choosing a wrong dirty inode to reclam > > memory. > > In ext4, it isn't a big problem if ext4_write_inode() is called from > kswapd. But if ext4_write_inode() is called from direct reclaim (which also > sets PF_MEMALLOC) we can deadlock because we may wait for transaction > commit and transaction commit may require locks (such as page lock or > waiting for page writeback to complete) which are held by the task > currently in direct reclaim. Your push to workqueue will silence the > warning but does not solve the possible deadlock. > > I'm actually not sure why you need to writeback the upper inode when > reclaiming overlayfs inode. Why not leave it on flush worker on upper fs? > Because it is the last chance we can sync dirty upper inode, I mean after evicting overlayfs inode we can not find the associated dirty upper inode from any list and that dirty upper inode will be skipped from the target of syncfs(). Thanks, cgxu