On Thu 16-04-20 21:52:27, Chengguang Xu wrote: > ---- 在 星期四, 2020-04-16 19:14:24 Jan Kara <jack@xxxxxxx> 撰写 ---- > > On Thu 16-04-20 14:08:59, Chengguang Xu wrote: > > > ---- 在 星期四, 2020-04-16 03:19:50 Miklos Szeredi <miklos@xxxxxxxxxx> 撰写 ---- > > > > On Mon, Feb 10, 2020 at 4:10 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > +void ovl_evict_inode(struct inode *inode) > > > > > +{ > > > > > + struct ovl_inode *oi = OVL_I(inode); > > > > > + struct ovl_write_inode_work ovl_wiw; > > > > > + DEFINE_WAIT_BIT(wait, &oi->flags, OVL_WRITE_INODE_PENDING); > > > > > + wait_queue_head_t *wqh; > > > > > + > > > > > + if (ovl_inode_upper(inode)) { > > > > > + if (current->flags & PF_MEMALLOC) { > > > > > + spin_lock(&inode->i_lock); > > > > > + ovl_set_flag(OVL_WRITE_INODE_PENDING, inode); > > > > > + wqh = bit_waitqueue(&oi->flags, > > > > > + OVL_WRITE_INODE_PENDING); > > > > > + prepare_to_wait(wqh, &wait.wq_entry, > > > > > + TASK_UNINTERRUPTIBLE); > > > > > + spin_unlock(&inode->i_lock); > > > > > + > > > > > + ovl_wiw.inode = inode; > > > > > + INIT_WORK(&ovl_wiw.work, ovl_write_inode_work_fn); > > > > > + schedule_work(&ovl_wiw.work); > > > > > + > > > > > + schedule(); > > > > > + finish_wait(wqh, &wait.wq_entry); > > > > > > > > What is the reason to do this in another thread if this is a PF_MEMALLOC task? > > > > > > Some underlying filesystems(for example ext4) check the flag in > > > ->write_inode() and treate it as an abnormal case.(warn and return) > > > > > > ext4_write_inode(): > > > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC) || > > > sb_rdonly(inode->i_sb)) > > > return 0; > > > > > > overlayfs inodes are always keeping clean even after wring/modifying > > > upperfile , so they are right target of kswapd but in the point of lower > > > layer, ext4 just thinks kswapd is choosing a wrong dirty inode to reclam > > > memory. > > > > In ext4, it isn't a big problem if ext4_write_inode() is called from > > kswapd. But if ext4_write_inode() is called from direct reclaim (which also > > sets PF_MEMALLOC) we can deadlock because we may wait for transaction > > commit and transaction commit may require locks (such as page lock or > > waiting for page writeback to complete) which are held by the task > > currently in direct reclaim. Your push to workqueue will silence the > > warning but does not solve the possible deadlock. > > > > I'm actually not sure why you need to writeback the upper inode when > > reclaiming overlayfs inode. Why not leave it on flush worker on upper fs? > > > > Because it is the last chance we can sync dirty upper inode, I mean after > evicting overlayfs inode we can not find the associated dirty upper inode > from any list and that dirty upper inode will be skipped from the target > of syncfs(). I see. But this flushing of dirty inodes on reclaim really isn't a great idea. It can also stall reclaim (due to it being stuck waiting for IO) and thus lead to bad behavior in low memory situations. It's better to just skip reclaiming such inodes - but then I agree it's a difficult question when to reclaim them. Ideally you'd need to hook into inode_lru_isolate() but that's just too ugly (but maybe it could be done in some clean manner). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR