From: Dave Chinner <dchinner@xxxxxxxxxx> Add a new superblock method for iterating all cached inodes in the inode cache. This will be used to replace the explicit sb->s_inodes iteration, and the caller will supply a callback function and a private data pointer that gets passed to the callback along with each inode that is iterated. There are two iteration functions provided. The first is the interface that everyone should be using - it provides an valid, unlocked and referenced inode that any inode operation (including blocking operations) is allowed on. The iterator infrastructure is responsible for lifecycle management, hence the subsystem callback only needs to implement the operation it wants to perform on all inodes. The second iterator interface is the unsafe variant for internal VFS use only. It simply iterates all VFS inodes without guaranteeing any state or taking references. This iteration is done under a RCU read lock to ensure that the VFS inode is not freed from under the callback. If the operation wishes to block, it must drop the RCU context after guaranteeing that the inode will not get freed. This unsafe iteration mechanism is needed for operations that need tight control over the state of the inodes they need to operate on. This mechanism allows the existing sb->s_inodes iteration models to be maintained, allowing a generic implementation for iterating all cached inodes on the superblock to be provided. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/internal.h | 2 + fs/super.c | 105 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 12 ++++++ 3 files changed, 119 insertions(+) diff --git a/fs/internal.h b/fs/internal.h index 37749b429e80..7039d13980c6 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -127,6 +127,8 @@ struct super_block *user_get_super(dev_t, bool excl); void put_super(struct super_block *sb); extern bool mount_capable(struct fs_context *); int sb_init_dio_done_wq(struct super_block *sb); +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data); /* * Prepare superblock for changing its read-only state (i.e., either remount diff --git a/fs/super.c b/fs/super.c index a16e6a6342e0..20a9446d943a 100644 --- a/fs/super.c +++ b/fs/super.c @@ -167,6 +167,111 @@ static void super_wake(struct super_block *sb, unsigned int flag) wake_up_var(&sb->s_flags); } +/** + * super_iter_inodes - iterate all the cached inodes on a superblock + * @sb: superblock to iterate + * @iter_fn: callback to run on every inode found. + * + * This function iterates all cached inodes on a superblock that are not in + * the process of being initialised or torn down. It will run @iter_fn() with + * a valid, referenced inode, so it is safe for the caller to do anything + * it wants with the inode except drop the reference the iterator holds. + * + */ +int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data, int flags) +{ + struct inode *inode, *old_inode = NULL; + int ret = 0; + + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + spin_lock(&inode->i_lock); + if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { + spin_unlock(&inode->i_lock); + continue; + } + + /* + * Skip over zero refcount inode if the caller only wants + * referenced inodes to be iterated. + */ + if ((flags & INO_ITER_REFERENCED) && + !atomic_read(&inode->i_count)) { + spin_unlock(&inode->i_lock); + continue; + } + + __iget(inode); + spin_unlock(&inode->i_lock); + spin_unlock(&sb->s_inode_list_lock); + iput(old_inode); + + ret = iter_fn(inode, private_data); + + old_inode = inode; + if (ret == INO_ITER_ABORT) { + ret = 0; + break; + } + if (ret < 0) + break; + + cond_resched(); + spin_lock(&sb->s_inode_list_lock); + } + spin_unlock(&sb->s_inode_list_lock); + iput(old_inode); + return ret; +} + +/** + * super_iter_inodes_unsafe - unsafely iterate all the inodes on a superblock + * @sb: superblock to iterate + * @iter_fn: callback to run on every inode found. + * + * This is almost certainly not the function you want. It is for internal VFS + * operations only. Please use super_iter_inodes() instead. If you must use + * this function, please add a comment explaining why it is necessary and the + * locking that makes it safe to use this function. + * + * This function iterates all cached inodes on a superblock that are attached to + * the superblock. It will pass each inode to @iter_fn unlocked and without + * having performed any existences checks on it. + + * @iter_fn must perform all necessary state checks on the inode itself to + * ensure safe operation. super_iter_inodes_unsafe() only guarantees that the + * inode exists and won't be freed whilst the callback is running. + * + * @iter_fn must not block. It is run in an atomic context that is not allowed + * to sleep to provide the inode existence guarantees. If the callback needs to + * do blocking operations it needs to track the inode itself and defer those + * operations until after the iteration completes. + * + * @iter_fn must provide conditional reschedule checks itself. If rescheduling + * or deferred processing is needed, it must return INO_ITER_ABORT to return to + * the high level function to perform those operations. It can then restart the + * iteration again. The high level code must provide forwards progress + * guarantees if they are necessary. + * + */ +void super_iter_inodes_unsafe(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data) +{ + struct inode *inode; + int ret; + + rcu_read_lock(); + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + ret = iter_fn(inode, private_data); + if (ret == INO_ITER_ABORT) + break; + } + spin_unlock(&sb->s_inode_list_lock); + rcu_read_unlock(); +} + /* * One thing we have to be careful of with a per-sb shrinker is that we don't * drop the last active reference to the superblock from within the shrinker. diff --git a/include/linux/fs.h b/include/linux/fs.h index eae5b67e4a15..0a6a462c45ab 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2213,6 +2213,18 @@ enum freeze_holder { FREEZE_MAY_NEST = (1U << 2), }; +/* Inode iteration callback return values */ +#define INO_ITER_DONE 0 +#define INO_ITER_ABORT 1 + +/* Inode iteration control flags */ +#define INO_ITER_REFERENCED (1U << 0) +#define INO_ITER_UNSAFE (1U << 1) + +typedef int (*ino_iter_fn)(struct inode *inode, void *priv); +int super_iter_inodes(struct super_block *sb, ino_iter_fn iter_fn, + void *private_data, int flags); + struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); -- 2.45.2