Re: [RFC PATCH 0/5] Remove dependency on congestion_wait in mm/

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 22 Sep 2021 06:46:21 +1000

On Mon, Sep 20, 2021 at 09:54:31AM +0100, Mel Gorman wrote:
> Cc list similar to "congestion_wait() and GFP_NOFAIL" as they're loosely
> related.
> 
> This is a prototype series that removes all calls to congestion_wait
> in mm/ and deletes wait_iff_congested. It's not a clever
> implementation but congestion_wait has been broken for a long time
> (https://lore.kernel.org/linux-mm/45d8b7a6-8548-65f5-cccf-9f451d4ae3d4@xxxxxxxxx/).
> Even if it worked, it was never a great idea. While excessive
> dirty/writeback pages at the tail of the LRU is one possibility that
> reclaim may be slow, there is also the problem of too many pages being
> isolated and reclaim failing for other reasons (elevated references,
> too many pages isolated, excessive LRU contention etc).
> 
> This series replaces the reclaim conditions with event driven ones
> 
> o If there are too many dirty/writeback pages, sleep until a timeout
>   or enough pages get cleaned
> o If too many pages are isolated, sleep until enough isolated pages
>   are either reclaimed or put back on the LRU
> o If no progress is being made, let direct reclaim tasks sleep until
>   another task makes progress
> 
> This has been lightly tested only and the testing was useless as the
> relevant code was not executed. The workload configurations I had that
> used to trigger these corner cases no longer work (yey?) and I'll need
> to implement a new synthetic workload. If someone is aware of a realistic
> workload that forces reclaim activity to the point where reclaim stalls
> then kindly share the details.

Got a git tree pointer so I can pull it into a test kernel so I can
see what impact it has on behaviour before I try to make sense of
the code?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx