Re: [RFC][PATCH 00/26] sched/numa

Avi Kivity <avi@xxxxxxxxxx> · Mon, 19 Mar 2012 14:07:23 +0200

On 03/19/2012 01:59 PM, Peter Zijlstra wrote:
> On Mon, 2012-03-19 at 13:42 +0200, Avi Kivity wrote:
> > > Now if you want to be able to scan per-thread, you need per-thread
> > > page-tables and I really don't want to ever see that. That will blow
> > > memory overhead and context switch times.
> > 
> > I thought of only duplicating down to the PDE level, that gets rid of
> > almost all of the overhead. 
>
> You still get the significant CR3 cost for thread switches. 

True.  Not so much for virt, which has one thread per cpu generally.

> [ /me grabs the SDM to find that PDE is what we in Linux call the pmd ]

Yes, sorry.

> That'll cut the memory overhead down but also the severely impact the
> accuracy.
>
> Also, I still don't see how such a scheme would correctly identify
> per-cpu memory in guest kernels. While less frequent its still very
> common to do remote access to per-cpu data. So even if you did page
> granularity you'd get a fair amount of pages that are accesses by all
> threads (vcpus) in the scan interval, even thought they're primarily
> accesses by just one.
>
> If you go to pmd level you get even less information.

That is true.  Which is why I like the explicit vnode thing.  The guest
kernel already knows how to affine vcpus to memory, we don't need to
scan to see if it's actually doing what we told it to do.  Scanning is
good for unmodified non-virt applications, or to prioritize the migration.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>