Re: [RFC v1] mm: add page preemption

Hillf Danton <hdanton@xxxxxxxx> · Wed, 23 Oct 2019 19:53:50 +0800

On Wed, 23 Oct 2019 10:17:29 +0200 Michal Hocko wrote:
> 
> On Tue 22-10-19 22:28:02, Hillf Danton wrote:
> > 
> > On Tue, 22 Oct 2019 14:42:41 +0200 Michal Hocko wrote:
> > > 
> > > On Tue 22-10-19 20:14:39, Hillf Danton wrote:
> > > > 
> > > > On Mon, 21 Oct 2019 14:27:28 +0200 Michal Hocko wrote:
> > > [...]
> > > > > Why do we care and which workloads would benefit and how much.
> > > > 
> > > > Page preemption, disabled by default, should be turned on by those
> > > > who wish that the performance of their workloads can survive memory
> > > > pressure to certain extent.
> > > 
> > > I am sorry but this doesn't say anything to me. How come not all
> > > workloads would fit that description?
> > 
> > That means pp plays a role when kswapd becomes active, and it may
> > prevent too much jitters in active lru pages.
> 
> This is still too vague to be useful in any way.

Page preemption is designed to function only under memory pressure by
suggesting kswapd to skip deactivating some pages based on prio comparison.
No page will be skipped without difference found in prio by design.
That said, no workload can be picked out before updating prio, so let
users who know that their workloads are sensitive to jitters in lru pages
chage the nice.
We are simply adding the pp feature; users are responsible for turning pp
on and changing nice if they feel necessary.

> > > > The number of pp users is supposed near the people who change the
> > > > nice value of their apps either to -1 or higher at least once a week,
> > > > less than vi users among UK's undergraduates.
> > > > 
> > > > > And last but not least why the existing infrastructure doesn't help
> > > > > (e.g. if you have clearly defined workloads with different
> > > > > memory consumption requirements then why don't you use memory cgroups to
> > > > > reflect the priority).
> > > > 
> > > > Good question:)
> > > > 
> > > > Though pp is implemented by preventing any task from reclaiming as many
> > > > pages as possible from other tasks that are higher on priority, it is
> > > > trying to introduce prio into page reclaiming, to add a feature.
> > > > 
> > > > Page and memcg are different objects after all; pp is being added at
> > > > the page granularity. It should be an option available in environments
> > > > without memcg enabled.
> > > 
> > > So do you actually want to establish LRUs per priority?
> > 
> > No, no change other than the prio for every lru page was added. LRU per prio
> > is too much to implement.
> 
> Well, considering that per page priority is a no go as already pointed
> out by Willy then you do not have other choice right?

No need to seek extra choice because of the prio introduced to reclaiming as
no one is hurt by design without pp enabled and prio updated.

> > > Why using memcgs is not an option?
> > 
> > I have plan to add prio in memcg. As you see, I sent a rfc before v0 with
> > nice added in memcg, and realised a couple days ago that its dependence on
> > soft limit reclaim is not acceptable.
> > 
> > But we can't do that without determining how to define memcg's prio.
> > What is in mind now is the highest (or lowest) prio of tasks in a memcg
> > with a knob offered to userspace.
> > 
> > If you like, I want to have a talk about it sometime later.
> 
> This doesn't really answer my question.
> Why cannot you use memcgs as they are now.

No prio provided.

> Why exactly do you need a fixed priority?

Prio comparison in global reclaim is what was added. Because every task has
prio makes that comparison possible.

> > > This is the main facility to partition reclaimable
> > > memory in the first place.

Is every task (pid != 1) contained in memcg? And why?

> > > You should really focus on explaining on why
> > > a much more fine grained control is needed much more thoroughly.

Which do you prefer, cello or fiddle? And why?

> > > > What is way different from the protections offered by memory cgroup
> > > > is that pages protected by memcg:min/low can't be reclaimed regardless
> > > > of memory pressure. Such guarantee is not available under pp as it only
> > > > suggests an extra factor to consider on deactivating lru pages.
> > > 
> > > Well, low limit can be breached if there is no eliglible memcg to be
> > > reclaimed. That means that you can shape some sort of priority by
> > > setting the low limit already.
> > > 
> > > [...]
> > > 
> > > > What was added on the reclaimer side is
> > > > 
> > > > 1, kswapd sets pgdat->kswapd_prio, the switch between page reclaimer
> > > >    and allocator in terms of prio, to the lowest value before taking
> > > >    a nap.
> > > > 
> > > > 2, any allocator is able to wake up the reclaimer because of the
> > > >    lowest prio, and it starts reclaiming pages using the waker's prio.
> > > > 
> > > > 3, allocator comes while kswapd is active, its prio is checked and
> > > >    no-op if kswapd is higher on prio; otherwise switch is updated
> > > >    with the higher prio.
> > > > 
> > > > 4, every time kswapd raises sc.priority that starts with DEF_PRIORITY,
> > > >    it is checked if there is pending update of switch; and kswapd's
> > > >    prio steps up if there is a pending one, thus its prio never steps
> > > >    down. Nor prio inversion. 
> > > > 
> > > > 5, goto 1 when kswapd finishes its work.
> > > 
> > > What about the direct reclaim?
> > 
> > Their prio will not change before reclaiming finishes, so leave it be.
> 
> This doesn't answer my question.

No prio inversion in direct reclaim if you mean that.

> > > What if pages of a lower priority are
> > > hard to reclaim? Do you want a process of a higher priority stall more
> > > just because it has to wait for those lower priority pages?
> > 
> > The problems above are not introduced by pp, let Mr. Kswapd take care of
> > them.
> 
> No, this is not an answer.

Is pp making them worse?

Thanks
Hillf