On Thu, Feb 29, 2024 at 12:29:35PM +1100, Dave Chinner wrote: > On Wed, Feb 28, 2024 at 07:37:58PM +0000, Matthew Wilcox wrote: > > On Tue, Feb 27, 2024 at 09:19:47PM +0200, Amir Goldstein wrote: > > > On Tue, Feb 27, 2024 at 8:56 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > > > Hello! > > > > > > > > Recent discussions [1] suggest that greater mutual understanding between > > > > memory reclaim on the one hand and RCU on the other might be in order. > > > > > > > > One possibility would be an open discussion. If it would help, I would > > > > be happy to describe how RCU reacts and responds to heavy load, along with > > > > some ways that RCU's reactions and responses could be enhanced if needed. > > > > > > > > > > Adding fsdevel as this should probably be a cross track session. > > > > Perhaps broaden this slightly. On the THP Cabal call we just had a > > conversation about the requirements on filesystems in the writeback > > path. We currently tell filesystem authors that the entire writeback > > path must avoid allocating memory in order to prevent deadlock (or use > > GFP_MEMALLOC). Is this appropriate? > > The reality is that filesystem developers have been ignoring that > "mm rule" for a couple of decades. It was also discussed at LSFMM a > decade ago (2014 IIRC) without resolution, so in the mean time we > just took control of our own destiny.... > > > It's a lot of work to assure that > > writing pagecache back will not allocate memory in, eg, the network stack, > > the device driver, and any other layers the write must traverse. > > > > With the removal of ->writepage from vmscan, perhaps we can make > > filesystem authors lives easier by relaxing this requirement as pagecache > > should be cleaned long before we get to reclaiming it. > > .... by removing memory reclaim page cache writeback support from > the filesystems entirely. > > IOWs, this rule hasn't been valid for a -long- time, so maybe it > is time to remove it. :) It's _never_ been valid, the entire IO stack allocates memory. This is what GFP_NOIO/GFP_NOFS is for, and additionaly mempools/biosets. If mm can't satisfy the allocation, they should fail it, and then the IO path will have fallbacks (but they wil be slow, i.e. iodepth will be greatly limited).