Re: [PATCH] vfs: Don't evict inode under the inode lru traversing context

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2024/8/6 0:10, Mateusz Guzik 写道:
On Mon, Aug 5, 2024 at 3:24 AM Zhihao Cheng <chengzhihao@xxxxxxxxxxxxxxx> wrote:

From: Zhihao Cheng <chengzhihao1@xxxxxxxxxx>

The inode reclaiming process(See function prune_icache_sb) collects all
reclaimable inodes and mark them with I_FREEING flag at first, at that
time, other processes will be stuck if they try getting these inodes
(See function find_inode_fast), then the reclaiming process destroy the
inodes by function dispose_list(). Some filesystems(eg. ext4 with
ea_inode feature, ubifs with xattr) may do inode lookup in the inode
evicting callback function, if the inode lookup is operated under the
inode lru traversing context, deadlock problems may happen.

[...]

I only have some tidy-ups with stuff I neglected to mention when
typing up my proposal.

---
  fs/inode.c         | 37 +++++++++++++++++++++++++++++++++++--
  include/linux/fs.h |  5 +++++
  2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 86670941884b..f1c6e8072f39 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -488,6 +488,36 @@ static void inode_lru_list_del(struct inode *inode)
                 this_cpu_dec(nr_unused);
  }

+static void inode_lru_isolating(struct inode *inode)
+{

         lockdep_assert_held(&inode->i_lock);

Adopt, will change in v2.

+       BUG_ON(inode->i_state & (I_LRU_ISOLATING | I_FREEING | I_WILL_FREE));
+       inode->i_state |= I_LRU_ISOLATING;
+}
+
+static void inode_lru_finish_isolating(struct inode *inode)
+{
+       spin_lock(&inode->i_lock);
+       BUG_ON(!(inode->i_state & I_LRU_ISOLATING));
+       inode->i_state &= ~I_LRU_ISOLATING;
+       wake_up_bit(&inode->i_state, __I_LRU_ISOLATING);
+       spin_unlock(&inode->i_lock);
+}
+
+static void inode_wait_for_lru_isolating(struct inode *inode)
+{
+       DEFINE_WAIT_BIT(wq, &inode->i_state, __I_LRU_ISOLATING);
+       wait_queue_head_t *wqh;
+

Top of evict() asserts on I_FREEING being set, used to decide it's not
legit to pin above. This dependency can be documented it in the
routine as well:

BUG_ON(!(inode->i_state & I_FREEING));

Sorry, I don't understand it, do you mean add the 'BUG_ON(!(inode->i_state & I_FREEING))' in inode_wait_for_lru_isolating()? Just like you talked, evict() already has one. So far, the inode_wait_for_lru_isolating() is called only in evict(), we can add the BUG_ON if it is called more than one places in future.>
+       spin_lock(&inode->i_lock);

This lock acquire is avoidable, which is always nice to do for
single-threaded perf. Probably can be also done for the writeback code
below. Maybe I'll massage it myself after the patch lands.

Yes, maybe inode_wait_for_writeback() and inode_wait_for_lru_isolating() can be lockless, and a smp_rmb() is needed, I think.

+       wqh = bit_waitqueue(&inode->i_state, __I_LRU_ISOLATING);
+       while (inode->i_state & I_LRU_ISOLATING) {
+               spin_unlock(&inode->i_lock);
+               __wait_on_bit(wqh, &wq, bit_wait, TASK_UNINTERRUPTIBLE);
+               spin_lock(&inode->i_lock);
+       }
+       spin_unlock(&inode->i_lock);
+}

So new arrivals *are* blocked by this point thanks to I_FREEING being
set on entry to evict(). This also means the flag can show up at most
once.

Thus instead of looping this should merely go to sleep once and assert
the I_LRU_ISOLATING flag is no longer set after waking up.

Good improvement, will change in v2.

+
  /**
   * inode_sb_list_add - add inode to the superblock list of inodes
   * @inode: inode to add
@@ -657,6 +687,9 @@ static void evict(struct inode *inode)

         inode_sb_list_del(inode);

+       /* Wait for LRU isolating to finish. */

I don't think this comment adds anything given the name of the func.

Adopt, will be deleted in v2.

+       inode_wait_for_lru_isolating(inode);
+
         /*
          * Wait for flusher thread to be done with the inode so that filesystem
          * does not start destroying it while writeback is still running. Since
@@ -855,7 +888,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
          * be under pressure before the cache inside the highmem zone.
          */
         if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) {
-               __iget(inode);
+               inode_lru_isolating(inode);
                 spin_unlock(&inode->i_lock);
                 spin_unlock(lru_lock);
                 if (remove_inode_buffers(inode)) {
@@ -867,7 +900,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
                                 __count_vm_events(PGINODESTEAL, reap);
                         mm_account_reclaimed_pages(reap);
                 }
-               iput(inode);
+               inode_lru_finish_isolating(inode);
                 spin_lock(lru_lock);
                 return LRU_RETRY;
         }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..fb0426f349fc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2392,6 +2392,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
   *
   * I_PINNING_FSCACHE_WB        Inode is pinning an fscache object for writeback.
   *
+ * I_LRU_ISOLATING     Inode is pinned being isolated from LRU without holding
+ *                     i_count.
+ *
   * Q: What is the difference between I_WILL_FREE and I_FREEING?
   */
  #define I_DIRTY_SYNC           (1 << 0)
@@ -2415,6 +2418,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  #define I_DONTCACHE            (1 << 16)
  #define I_SYNC_QUEUED          (1 << 17)
  #define I_PINNING_NETFS_WB     (1 << 18)
+#define __I_LRU_ISOLATING      19
+#define I_LRU_ISOLATING                (1 << __I_LRU_ISOLATING)

  #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
  #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
--
2.39.2








[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux