On Wed, Aug 04, 2021 at 07:07:39PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > Now that we defer inode inactivation, we've decoupled the process of > unlinking or closing an inode from the process of inactivating it. In > theory this should lead to better throughput since we now inactivate the > queued inodes in batches instead of one at a time. > > Unfortunately, one of the primary risks with this decoupling is the loss > of rate control feedback between the frontend and background threads. > In other words, a rm -rf /* thread can run the system out of memory if > it can queue inodes for inactivation and jump to a new CPU faster than > the background threads can actually clear the deferred work. The > workers can get scheduled off the CPU if they have to do IO, etc. > > To solve this problem, we configure a shrinker so that it will activate > the /second/ time the shrinkers are called. The custom shrinker will > queue all percpu deferred inactivation workers immediately and set a > flag to force frontend callers who are releasing a vfs inode to wait for > the inactivation workers. > > On my test VM with 560M of RAM and a 2TB filesystem, this seems to solve > most of the OOMing problem when deleting 10 million inodes. > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > fs/xfs/xfs_icache.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++- > fs/xfs/xfs_icache.h | 1 + > fs/xfs/xfs_mount.c | 9 ++++- > fs/xfs/xfs_mount.h | 3 ++ > fs/xfs/xfs_trace.h | 37 ++++++++++++++++++- > 5 files changed, 147 insertions(+), 5 deletions(-) I'm still not really convinced this is the right way to go here, but it doesn't hurt much so lets run with it for now. When I rework the inode reclaim shrinker hooks I'll revisit this. Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx