Re: [RFC v1] mm: add page preemption

Hillf Danton <hdanton@xxxxxxxx> · Tue, 22 Oct 2019 22:28:02 +0800

On Tue, 22 Oct 2019 14:42:41 +0200 Michal Hocko wrote:
> 
> On Tue 22-10-19 20:14:39, Hillf Danton wrote:
> > 
> > On Mon, 21 Oct 2019 14:27:28 +0200 Michal Hocko wrote:
> [...]
> > > Why do we care and which workloads would benefit and how much.
> > 
> > Page preemption, disabled by default, should be turned on by those
> > who wish that the performance of their workloads can survive memory
> > pressure to certain extent.
> 
> I am sorry but this doesn't say anything to me. How come not all
> workloads would fit that description?

That means pp plays a role when kswapd becomes active, and it may
prevent too much jitters in active lru pages.

> > The number of pp users is supposed near the people who change the
> > nice value of their apps either to -1 or higher at least once a week,
> > less than vi users among UK's undergraduates.
> > 
> > > And last but not least why the existing infrastructure doesn't help
> > > (e.g. if you have clearly defined workloads with different
> > > memory consumption requirements then why don't you use memory cgroups to
> > > reflect the priority).
> > 
> > Good question:)
> > 
> > Though pp is implemented by preventing any task from reclaiming as many
> > pages as possible from other tasks that are higher on priority, it is
> > trying to introduce prio into page reclaiming, to add a feature.
> > 
> > Page and memcg are different objects after all; pp is being added at
> > the page granularity. It should be an option available in environments
> > without memcg enabled.
> 
> So do you actually want to establish LRUs per priority?

No, no change other than the prio for every lru page was added. LRU per prio
is too much to implement.

> Why using memcgs is not an option?

I have plan to add prio in memcg. As you see, I sent a rfc before v0 with
nice added in memcg, and realised a couple days ago that its dependence on
soft limit reclaim is not acceptable.

But we can't do that without determining how to define memcg's prio.
What is in mind now is the highest (or lowest) prio of tasks in a memcg
with a knob offered to userspace.

If you like, I want to have a talk about it sometime later.

> This is the main facility to partition reclaimable
> memory in the first place. You should really focus on explaining on why
> a much more fine grained control is needed much more thoroughly.
> 
> > What is way different from the protections offered by memory cgroup
> > is that pages protected by memcg:min/low can't be reclaimed regardless
> > of memory pressure. Such guarantee is not available under pp as it only
> > suggests an extra factor to consider on deactivating lru pages.
> 
> Well, low limit can be breached if there is no eliglible memcg to be
> reclaimed. That means that you can shape some sort of priority by
> setting the low limit already.
> 
> [...]
> 
> > What was added on the reclaimer side is
> > 
> > 1, kswapd sets pgdat->kswapd_prio, the switch between page reclaimer
> >    and allocator in terms of prio, to the lowest value before taking
> >    a nap.
> > 
> > 2, any allocator is able to wake up the reclaimer because of the
> >    lowest prio, and it starts reclaiming pages using the waker's prio.
> > 
> > 3, allocator comes while kswapd is active, its prio is checked and
> >    no-op if kswapd is higher on prio; otherwise switch is updated
> >    with the higher prio.
> > 
> > 4, every time kswapd raises sc.priority that starts with DEF_PRIORITY,
> >    it is checked if there is pending update of switch; and kswapd's
> >    prio steps up if there is a pending one, thus its prio never steps
> >    down. Nor prio inversion. 
> > 
> > 5, goto 1 when kswapd finishes its work.
> 
> What about the direct reclaim?

Their prio will not change before reclaiming finishes, so leave it be.

> What if pages of a lower priority are
> hard to reclaim? Do you want a process of a higher priority stall more
> just because it has to wait for those lower priority pages?

The problems above are not introduced by pp, let Mr. Kswapd take care of
them.

(It is 22:23 local time, lets continue after a 7-hour sleep. Good night.)

Thanks
Hillf