On Mon, Sep 20, 2021 at 09:54:31AM +0100, Mel Gorman wrote: > Cc list similar to "congestion_wait() and GFP_NOFAIL" as they're loosely > related. > > This is a prototype series that removes all calls to congestion_wait > in mm/ and deletes wait_iff_congested. It's not a clever > implementation but congestion_wait has been broken for a long time > (https://lore.kernel.org/linux-mm/45d8b7a6-8548-65f5-cccf-9f451d4ae3d4@xxxxxxxxx/). > Even if it worked, it was never a great idea. While excessive > dirty/writeback pages at the tail of the LRU is one possibility that > reclaim may be slow, there is also the problem of too many pages being > isolated and reclaim failing for other reasons (elevated references, > too many pages isolated, excessive LRU contention etc). > > This series replaces the reclaim conditions with event driven ones > > o If there are too many dirty/writeback pages, sleep until a timeout > or enough pages get cleaned > o If too many pages are isolated, sleep until enough isolated pages > are either reclaimed or put back on the LRU > o If no progress is being made, let direct reclaim tasks sleep until > another task makes progress > > This has been lightly tested only and the testing was useless as the > relevant code was not executed. The workload configurations I had that > used to trigger these corner cases no longer work (yey?) and I'll need > to implement a new synthetic workload. If someone is aware of a realistic > workload that forces reclaim activity to the point where reclaim stalls > then kindly share the details. Got a git tree pointer so I can pull it into a test kernel so I can see what impact it has on behaviour before I try to make sense of the code? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx