Re: [PATCH 6/7] mm: parallelize deferred_init_memmap()

Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> · Wed, 6 May 2020 18:43:35 -0400

On Wed, May 06, 2020 at 03:36:54PM -0700, Alexander Duyck wrote:
> On Wed, May 6, 2020 at 3:21 PM Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> wrote:
> >
> > On Tue, May 05, 2020 at 07:55:43AM -0700, Alexander Duyck wrote:
> > > One question about this data. What is the power management
> > > configuration on the systems when you are running these tests? I'm
> > > just curious if CPU frequency scaling, C states, and turbo are
> > > enabled?
> >
> > Yes, intel_pstate is loaded in active mode without hwp and with turbo enabled
> > (those power management docs are great by the way!) and intel_idle is in use
> > too.
> >
> > > I ask because that is what I have seen usually make the
> > > difference in these kind of workloads as the throughput starts
> > > dropping off as you start seeing the core frequency lower and more
> > > cores become active.
> >
> > If I follow, you're saying there's a chance performance would improve with the
> > above disabled, but how often would a system be configured that way?  Even if
> > it were faster, the machine is configured how it's configured, or am I missing
> > your point?
> 
> I think you might be missing my point. What I was getting at is that I
> know for performance testing sometimes C states and P states get
> disabled in order to get consistent results between runs, it sounds
> like you have them enabled though. I was just wondering if you had
> disabled them or not. If they were disabled then you wouldn't get the
> benefits of turbo and as such adding more cores wouldn't come at a
> penalty, while with it enabled the first few cores should start to
> slow down as they fell out of turbo mode. So it may be part of the
> reason why you are only hitting about 10x at full core count.

All right, that makes way more sense.

> As it stands I think your code may speed up a bit if you split the
> work up based on section instead of max order. That would get rid of
> any cache bouncing you may be doing on the pageblock flags and reduce
> the overhead for splitting the work up into individual pieces since
> each piece will be bigger.

See my other mail.