Re: [PATCH] mm/compaction: use proper zoneid for compaction_suitable()

Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> · Fri, 26 Jul 2019 11:26:02 +0100

On Fri, Jul 26, 2019 at 06:06:48PM +0800, Yafang Shao wrote:
> On Fri, Jul 26, 2019 at 3:09 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Jul 25, 2019 at 09:50:21AM -0400, Yafang Shao wrote:
> > > By now there're three compaction paths,
> > > - direct compaction
> > > - kcompactd compcation
> > > - proc triggered compaction
> > > When we do compaction in all these paths, we will use compaction_suitable()
> > > to check whether a zone is suitable to do compaction.
> > >
> > > There're some issues around the usage of compaction_suitable().
> > > We don't use the proper zoneid in kcompactd_node_suitable() when try to
> > > wakeup kcompactd. In the kcompactd compaction paths, we call
> > > compaction_suitable() twice and the zoneid isn't proper in the second call.
> > > For proc triggered compaction, the classzone_idx is always zero.
> > >
> > > In order to fix these issues, I change the type of classzone_idx in the
> > > struct compact_control from const int to int and assign the proper zoneid
> > > before calling compact_zone().
> > >
> >
> > What is actually fixed by this?
> >
> 
> Recently there's a page alloc failure on our server because the
> compaction can't satisfy it.

That could be for a wide variety of reasons. There are limits to how
aggressive compaction will be but if there are unmovable pages preventing
the allocation, no amount of cleverness with the wakeups will change that.

> This issue is unproducible, so I have to view the compaction code and
> find out the possible solutions.

For high allocation success rates, the focus should be on strictness of
fragmentation control (hard, multiple tradeoffs) or increasing the number
of pages that can be moved (very hard, multiple tradeoffs).

> When I'm reading these compaction code, I find some  misuse of
> compaction_suitable().
> But after you point out, I find that I missed something.
> The classzone_idx should represent the alloc request, otherwise we may
> do unnecessary compaction on a zone.
> Thanks a lot for your explaination.
> 

Exactly.

> Hi Andrew,
> 
> Pls. help drop this patch. Sorry about that.

Agreed but there is no need to apologise. The full picture of this problem
is not obvious, not described anywhere and it's extremely difficult to
test and verify.

> > > <SNIP>
> > > @@ -2535,7 +2535,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
> > >                                                       cc.classzone_idx);
> > >       count_compact_event(KCOMPACTD_WAKE);
> > >
> > > -     for (zoneid = 0; zoneid <= cc.classzone_idx; zoneid++) {
> > > +     for (zoneid = 0; zoneid <= pgdat->kcompactd_classzone_idx; zoneid++) {
> > >               int status;
> > >
> > >               zone = &pgdat->node_zones[zoneid];
> >
> > This variable can be updated by a wakeup while the loop is executing
> > making the loop more difficult to reason about given the exit conditions
> > can change.
> >
> 
> Thanks for your point out.
> 
> But seems there're still issues event without my change ?
> For example,
> If we call wakeup_kcompactd() while the kcompactd is running,
> we just modify the kcompactd_max_order and kcompactd_classzone_idx and
> then return.
> Then in another path, the wakeup_kcompactd() is called again,
> so kcompactd_classzone_idx and kcompactd_max_order will be override,
> that means the previous wakeup is missed.
> Right ?
> 

That's intended. When kcompactd wakes up, it takes a snapshot of what is
requested and works on that. Other requests can update the requirements for
a future compaction request if necessary. One could argue that the wakeup
is missed but really it's "defer that request to some kcompactd activity
in the future". If kcompactd loops until there are no more requests, it
can consume an excessive amount of CPU due to requests continually keeping
it awake. kcompactd is best-effort to reduce the amount of direct stalls
due to compaction but an allocation request always faces the possibility
that it may stall because a kernel thread has not made enough progress
or failed.

FWIW, similar problems hit kswapd in the past where allocation requests
could artifically keep it awake consuming 100% of CPU.

-- 
Mel Gorman
SUSE Labs