Re: [Lsf-pc] [LSF/MM TOPIC] Congestion

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 12 Feb 2020 19:18:10 -0800

On Thu, 6 Feb 2020 16:08:53 -0800 Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:

> On Fri, Feb 07, 2020 at 10:19:28AM +1100, Dave Chinner wrote:
> > But detecting an abundance dirty pages/inodes on the LRU doesn't
> > really solve the problem of determining if and/or how long we should
> > wait for IO before we try to free more objects. There is no problem
> > with having lots of dirty pages/inodes on the LRU as long as the IO
> > subsystem keeps up with the rate at which reclaim is asking them to
> > be written back via async mechanisms (bdi writeback, metadata
> > writeback, etc).
> > 
> > The problem comes when we cannot make efficient progress cleaning
> > pages/inodes on the LRU because the IO subsystem is overloaded and
> > cannot clean pages/inodes any faster. At this point, we have to wait
> > for the IO subsystem to make progress and without feedback from the
> > IO subsystem, we have no idea how fast that progress is made. Hence
> > we have no idea how long we need to wait before trying to reclaim
> > again. i.e. the answer can be different depending on hardware
> > behaviour, not just the current instantaneous reclaim and IO state.
> > 
> > That's the fundamental problem we need to solve, and realistically
> > it can only be done with some level of feedback from the IO
> > subsystem.
> 
> That triggered a memory for me.  Jeremy Kerr presented a paper at LCA2006
> on a different model where the device driver pulls dirty things from the VM
> rather than having the VM push dirty things to the device driver.  It was
> prototyped in K42 rather than Linux, but the idea might be useful.
> 
> http://jk.ozlabs.org/projects/k42/
> http://jk.ozlabs.org/projects/k42/device-driven-IO-lca06.pdf

Fun.  Device drivers says "I have spare bandwidth so send me some stuff".

But if device drivers could do that, we wouldn't have broken congestion
in the first place ;)