On Fri, Jul 19, 2024 at 11:21:51AM +0800, Zhihao Cheng wrote: > 在 2024/7/18 21:40, Jan Kara 写道: > > I'm pondering about the best way to fix this. Maybe we could handle the > > need for inode pinning in inode_lru_isolate() in a similar way as in > > writeback code so that last iput() cannot happen from inode_lru_isolate(). > > In writeback we use I_SYNC flag to pin the inode and evict() waits for this > > flag to clear. I'll probably sleep to it and if I won't find it too > > disgusting to live tomorrow, I can code it. > > > > I guess that you may modify like this: > diff --git a/fs/inode.c b/fs/inode.c > index f356fe2ec2b6..5b1a9b23f53f 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -457,7 +457,7 @@ EXPORT_SYMBOL(ihold); > > static void __inode_add_lru(struct inode *inode, bool rotate) > { > - if (inode->i_state & (I_DIRTY_ALL | I_SYNC | I_FREEING | > I_WILL_FREE)) > + if (inode->i_state & (I_DIRTY_ALL | I_SYNC | I_FREEING | I_WILL_FREE > | I_PINING)) > return; > if (atomic_read(&inode->i_count)) > return; > @@ -845,7 +845,7 @@ static enum lru_status inode_lru_isolate(struct > list_head *item, > * be under pressure before the cache inside the highmem zone. > */ > if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) { > - __iget(inode); > + inode->i_state |= I_PINING; > spin_unlock(&inode->i_lock); > spin_unlock(lru_lock); > if (remove_inode_buffers(inode)) { > @@ -857,7 +857,10 @@ static enum lru_status inode_lru_isolate(struct > list_head *item, > __count_vm_events(PGINODESTEAL, reap); > mm_account_reclaimed_pages(reap); > } > - iput(inode); > + spin_lock(&inode->i_lock); > + inode->i_state &= ~I_PINING; > + wake_up_bit(&inode->i_state, __I_PINING); > + spin_unlock(&inode->i_lock); > spin_lock(lru_lock); > return LRU_RETRY; > } > @@ -1772,6 +1775,7 @@ static void iput_final(struct inode *inode) > return; > } > > + inode_wait_for_pining(inode); > state = inode->i_state; > if (!drop) { > WRITE_ONCE(inode->i_state, state | I_WILL_FREE); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index fd34b5755c0b..daf094fff5fe 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2415,6 +2415,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, > struct kiocb *kiocb_src, > #define I_DONTCACHE (1 << 16) > #define I_SYNC_QUEUED (1 << 17) > #define I_PINNING_NETFS_WB (1 << 18) > +#define __I_PINING 19 > +#define I_PINING (1 << __I_PINING) > > #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC) > #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES) > > , which means that we will import a new inode state to solve the problem. > My non-maintainer $0,03 is as follows: 1. I_PINING is too generic of a name. I_LRU_PINNED or something else indicating what this is for would be prudent 2. while not specific to this patch, the handling of i_state is too accidental-breakage friendly. a full blown solution is way out of the scope here, but something can be done to future-proof this work anyway. To that end I would suggest: 1. inode_lru_pin() which appart from setting the flag includes: BUG_ON(inode->i_state & (I_LRU_PINNED | I_FREEING | I_WILL_FREE) 2. inode_lru_unpin() which apart from unsetting the flag + wakeup includes: BUG_ON(!(inode->i_state & I_LRU_PINNED)) 3. inode_lru_wait_for_pinned() However, a non-cosmetic remark is that at the spot inode_wait_for_pining gets invoked none of the of the pinning-blocking flags may be set (to my reading anyway). This is not the end of the world, but it does mean the waiting routine will have to check stuff in a loop. Names are not that important, the key is to keep the logic and dependencies close by code-wise.