On Thu 16-04-20 10:00:13, Miklos Szeredi wrote: > On Thu, Apr 16, 2020 at 9:39 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > On Thu, Apr 16, 2020 at 9:21 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > On Thu, Apr 16, 2020 at 8:09 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > > > > ---- 在 星期四, 2020-04-16 03:19:50 Miklos Szeredi <miklos@xxxxxxxxxx> 撰写 ---- > > > > > On Mon, Feb 10, 2020 at 4:10 AM Chengguang Xu <cgxu519@xxxxxxxxxxxx> wrote: > > > > > > > > > + if (current->flags & PF_MEMALLOC) { > > > > > > + spin_lock(&inode->i_lock); > > > > > > + ovl_set_flag(OVL_WRITE_INODE_PENDING, inode); > > > > > > + wqh = bit_waitqueue(&oi->flags, > > > > > > + OVL_WRITE_INODE_PENDING); > > > > > > + prepare_to_wait(wqh, &wait.wq_entry, > > > > > > + TASK_UNINTERRUPTIBLE); > > > > > > + spin_unlock(&inode->i_lock); > > > > > > + > > > > > > + ovl_wiw.inode = inode; > > > > > > + INIT_WORK(&ovl_wiw.work, ovl_write_inode_work_fn); > > > > > > + schedule_work(&ovl_wiw.work); > > > > > > + > > > > > > + schedule(); > > > > > > + finish_wait(wqh, &wait.wq_entry); > > > > > > > > > > What is the reason to do this in another thread if this is a PF_MEMALLOC task? > > > > > > > > Some underlying filesystems(for example ext4) check the flag in ->write_inode() > > > > and treate it as an abnormal case.(warn and return) > > > > > > > > ext4_write_inode(): > > > > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC) || > > > > sb_rdonly(inode->i_sb)) > > > > return 0; > > > > > > > > overlayfs inodes are always keeping clean even after wring/modifying upperfile , > > > > so they are right target of kswapd but in the point of lower layer, ext4 just thinks > > > > kswapd is choosing a wrong dirty inode to reclam memory. > > > > > > I don't get it. Why can't overlayfs just skip the writeback of upper > > > inode in the reclaim case? It will be written back through the normal > > > relcaim channels. > > > > And how do we get reclaim on overlay inode at all? Overlay inodes > > will get evicted immediately after their refcount drops to zero, so > > there's absolutely no chance that memory reclaim will encounter them, > > no? > > Spoke too soon. Obviously this case is about dentry reclaim, not > inode reclaim. > > So how about temporarily clearing PF_MEMALLOC in this case? Doing > this is a kernel thread doesn't seem to add any advantages. Clearing PF_MEMALLOC will not solve the deadlock I've described in the reply to Chengguang. Ext4 really cannot safely handle data integrity writeback (which is what write_inode_now(inode, 1) does) from direct reclaim. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR