On Fri 17-04-20 09:23:07, Chengguang Xu wrote: > ---- 在 星期四, 2020-04-16 22:33:49 Jan Kara <jack@xxxxxxx> 撰写 ---- > > On Thu 16-04-20 21:52:27, Chengguang Xu wrote: > > > ---- 在 星期四, 2020-04-16 19:14:24 Jan Kara <jack@xxxxxxx> 撰写 ---- > > > > On Thu 16-04-20 14:08:59, Chengguang Xu wrote: > > > > > ---- 在 星期四, 2020-04-16 03:19:50 Miklos Szeredi <miklos@xxxxxxxxxx> 撰写 ---- > > > > > > On Mon, Feb 10, 2020 at 4:10 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > > > +void ovl_evict_inode(struct inode *inode) > > > > > > > +{ > > > > > > > + struct ovl_inode *oi = OVL_I(inode); > > > > > > > + struct ovl_write_inode_work ovl_wiw; > > > > > > > + DEFINE_WAIT_BIT(wait, &oi->flags, OVL_WRITE_INODE_PENDING); > > > > > > > + wait_queue_head_t *wqh; > > > > > > > + > > > > > > > + if (ovl_inode_upper(inode)) { > > > > > > > + if (current->flags & PF_MEMALLOC) { > > > > > > > + spin_lock(&inode->i_lock); > > > > > > > + ovl_set_flag(OVL_WRITE_INODE_PENDING, inode); > > > > > > > + wqh = bit_waitqueue(&oi->flags, > > > > > > > + OVL_WRITE_INODE_PENDING); > > > > > > > + prepare_to_wait(wqh, &wait.wq_entry, > > > > > > > + TASK_UNINTERRUPTIBLE); > > > > > > > + spin_unlock(&inode->i_lock); > > > > > > > + > > > > > > > + ovl_wiw.inode = inode; > > > > > > > + INIT_WORK(&ovl_wiw.work, ovl_write_inode_work_fn); > > > > > > > + schedule_work(&ovl_wiw.work); > > > > > > > + > > > > > > > + schedule(); > > > > > > > + finish_wait(wqh, &wait.wq_entry); > > > > > > > > > > > > What is the reason to do this in another thread if this is a PF_MEMALLOC task? > > > > > > > > > > Some underlying filesystems(for example ext4) check the flag in > > > > > ->write_inode() and treate it as an abnormal case.(warn and return) > > > > > > > > > > ext4_write_inode(): > > > > > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC) || > > > > > sb_rdonly(inode->i_sb)) > > > > > return 0; > > > > > > > > > > overlayfs inodes are always keeping clean even after wring/modifying > > > > > upperfile , so they are right target of kswapd but in the point of lower > > > > > layer, ext4 just thinks kswapd is choosing a wrong dirty inode to reclam > > > > > memory. > > > > > > > > In ext4, it isn't a big problem if ext4_write_inode() is called from > > > > kswapd. But if ext4_write_inode() is called from direct reclaim (which also > > > > sets PF_MEMALLOC) we can deadlock because we may wait for transaction > > > > commit and transaction commit may require locks (such as page lock or > > > > waiting for page writeback to complete) which are held by the task > > > > currently in direct reclaim. Your push to workqueue will silence the > > > > warning but does not solve the possible deadlock. > > > > > > > > I'm actually not sure why you need to writeback the upper inode when > > > > reclaiming overlayfs inode. Why not leave it on flush worker on upper fs? > > > > > > > > > > Because it is the last chance we can sync dirty upper inode, I mean after > > > evicting overlayfs inode we can not find the associated dirty upper inode > > > from any list and that dirty upper inode will be skipped from the target > > > of syncfs(). > > > > I see. But this flushing of dirty inodes on reclaim really isn't a great > > idea. It can also stall reclaim (due to it being stuck waiting for IO) and > > thus lead to bad behavior in low memory situations. It's better to just > > skip reclaiming such inodes - but then I agree it's a difficult question > > when to reclaim them. Ideally you'd need to hook into inode_lru_isolate() > > but that's just too ugly (but maybe it could be done in some clean manner). > > > > How about prepare another list to organize this kind of inode, when evicting overlay > inode we grab dirty upper inode reference, put a entry(a field point to upper inode) > into this new list and retrun, after that periodically checking list entry to release(iput) > clean upper inode. > > In overlayfs's syncfs(), we also need to iterate this new list after iterating oi->upper_inodes_list. Sure, that would work. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR