Re: [PATCH RFC tip/core/rcu] Add shrinker to shift to fast/inefficient GP mode

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 7 May 2020 10:29:40 -0700

On Thu, May 07, 2020 at 10:09:03AM -0700, Paul E. McKenney wrote:
> On Thu, May 07, 2020 at 01:00:06PM -0400, Johannes Weiner wrote:
> > On Wed, May 06, 2020 at 05:55:35PM -0700, Andrew Morton wrote:
> > > On Wed, 6 May 2020 17:42:40 -0700 "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
> > > 
> > > > This commit adds a shrinker so as to inform RCU when memory is scarce.
> > > > RCU responds by shifting into the same fast and inefficient mode that is
> > > > used in the presence of excessive numbers of RCU callbacks.  RCU remains
> > > > in this state for one-tenth of a second, though this time window can be
> > > > extended by another call to the shrinker.
> > 
> > We may be able to use shrinkers here, but merely being invoked does
> > not carry a reliable distress signal.
> > 
> > Shrinkers get invoked whenever vmscan runs. It's a useful indicator
> > for when to age an auxiliary LRU list - test references, clear and
> > rotate or reclaim stale entries. The urgency, and what can and cannot
> > be considered "stale", is encoded in the callback frequency and scan
> > counts, and meant to be relative to the VM's own rate of aging: "I've
> > tested X percent of mine for recent use, now you go and test the same
> > share of your pool." It doesn't translate well to other
> > interpretations of the callbacks, although people have tried.
> 
> Would it make sense for RCU to interpret two invocations within (say)
> 100ms of each other as indicating urgency?  (Hey, I had to ask!)
> 
> > > > If it proves feasible, a later commit might add a function call directly
> > > > indicating the end of the period of scarce memory.
> > > 
> > > (Cc David Chinner, who often has opinions on shrinkers ;))
> > > 
> > > It's a bit abusive of the intent of the slab shrinkers, but I don't
> > > immediately see a problem with it.  Always returning 0 from
> > > ->scan_objects might cause a problem in some situations(?).
> > > 
> > > Perhaps we should have a formal "system getting low on memory, please
> > > do something" notification API.
> > 
> > It's tricky to find a useful definition of what low on memory
> > means. In the past we've used sc->priority cutoffs, the vmpressure
> > interface (reclaimed/scanned - reclaim efficiency cutoffs), oom
> > notifiers (another reclaim efficiency cutoff). But none of these
> > reliably capture "distress", and they vary highly between different
> > hardware setups. It can be hard to trigger OOM itself on fast IO
> > devices, even when the machine is way past useful (where useful is
> > somewhat subjective to the user). Userspace OOM implementations that
> > consider userspace health (also subjective) are getting more common.
> > 
> > > How significant is this?  How much memory can RCU consume?
> > 
> > I think if rcu can end up consuming a significant share of memory, one
> > way that may work would be to do proper shrinker integration and track
> > the age of its objects relative to the age of other allocations in the
> > system. I.e. toss them all on a clock list with "new" bits and shrink
> > them at VM velocity. If the shrinker sees objects with new bit set,
> > clear and rotate. If it sees objects without them, we know rcu_heads
> > outlive cache pages etc. and should probably cycle faster too.
> 
> It would be easy for RCU to pass back (or otherwise use) the age of the
> current grace period, if that would help.
> 
> Tracking the age of individual callbacks is out of the question due to
> memory overhead, but RCU could approximate this via statistical sampling.
> Comparing this to grace-period durations could give information as to
> whether making grace periods go faster would be helpful.
> 
> But, yes, it would be better to have an elusive unambiguous indication
> of distress.  ;-)

And I have dropped this patch for the time being, but I do hope that
it served a purpose in illustrating that it is not difficult to put RCU
into a fast-but-inefficient mode when needed.

							Thanx, Paul