Re: [PATCH] mm/compaction: use proper zoneid for compaction_suitable()

Yafang Shao <laoar.shao@xxxxxxxxx> · Fri, 26 Jul 2019 19:13:44 +0800

On Fri, Jul 26, 2019 at 6:26 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Jul 26, 2019 at 06:06:48PM +0800, Yafang Shao wrote:
> > On Fri, Jul 26, 2019 at 3:09 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Jul 25, 2019 at 09:50:21AM -0400, Yafang Shao wrote:
> > > > By now there're three compaction paths,
> > > > - direct compaction
> > > > - kcompactd compcation
> > > > - proc triggered compaction
> > > > When we do compaction in all these paths, we will use compaction_suitable()
> > > > to check whether a zone is suitable to do compaction.
> > > >
> > > > There're some issues around the usage of compaction_suitable().
> > > > We don't use the proper zoneid in kcompactd_node_suitable() when try to
> > > > wakeup kcompactd. In the kcompactd compaction paths, we call
> > > > compaction_suitable() twice and the zoneid isn't proper in the second call.
> > > > For proc triggered compaction, the classzone_idx is always zero.
> > > >
> > > > In order to fix these issues, I change the type of classzone_idx in the
> > > > struct compact_control from const int to int and assign the proper zoneid
> > > > before calling compact_zone().
> > > >
> > >
> > > What is actually fixed by this?
> > >
> >
> > Recently there's a page alloc failure on our server because the
> > compaction can't satisfy it.
>
> That could be for a wide variety of reasons. There are limits to how
> aggressive compaction will be but if there are unmovable pages preventing
> the allocation, no amount of cleverness with the wakeups will change that.
>

Yes, we should know whether it is lack of movable pages or the
compaction can't catch up first.
I think it would be better if there're some debugging facilities could
help us do that.

> > This issue is unproducible, so I have to view the compaction code and
> > find out the possible solutions.
>
> For high allocation success rates, the focus should be on strictness of
> fragmentation control (hard, multiple tradeoffs) or increasing the number
> of pages that can be moved (very hard, multiple tradeoffs).
>

Agreed, that's a tradeoff.

> > When I'm reading these compaction code, I find some  misuse of
> > compaction_suitable().
> > But after you point out, I find that I missed something.
> > The classzone_idx should represent the alloc request, otherwise we may
> > do unnecessary compaction on a zone.
> > Thanks a lot for your explaination.
> >
>
> Exactly.
>
> > Hi Andrew,
> >
> > Pls. help drop this patch. Sorry about that.
>
> Agreed but there is no need to apologise. The full picture of this problem
> is not obvious, not described anywhere and it's extremely difficult to
> test and verify.
>
> > > > <SNIP>
> > > > @@ -2535,7 +2535,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
> > > >                                                       cc.classzone_idx);
> > > >       count_compact_event(KCOMPACTD_WAKE);
> > > >
> > > > -     for (zoneid = 0; zoneid <= cc.classzone_idx; zoneid++) {
> > > > +     for (zoneid = 0; zoneid <= pgdat->kcompactd_classzone_idx; zoneid++) {
> > > >               int status;
> > > >
> > > >               zone = &pgdat->node_zones[zoneid];
> > >
> > > This variable can be updated by a wakeup while the loop is executing
> > > making the loop more difficult to reason about given the exit conditions
> > > can change.
> > >
> >
> > Thanks for your point out.
> >
> > But seems there're still issues event without my change ?
> > For example,
> > If we call wakeup_kcompactd() while the kcompactd is running,
> > we just modify the kcompactd_max_order and kcompactd_classzone_idx and
> > then return.
> > Then in another path, the wakeup_kcompactd() is called again,
> > so kcompactd_classzone_idx and kcompactd_max_order will be override,
> > that means the previous wakeup is missed.
> > Right ?
> >
>
> That's intended. When kcompactd wakes up, it takes a snapshot of what is
> requested and works on that. Other requests can update the requirements for
> a future compaction request if necessary. One could argue that the wakeup
> is missed but really it's "defer that request to some kcompactd activity
> in the future". If kcompactd loops until there are no more requests, it
> can consume an excessive amount of CPU due to requests continually keeping
> it awake. kcompactd is best-effort to reduce the amount of direct stalls
> due to compaction but an allocation request always faces the possibility
> that it may stall because a kernel thread has not made enough progress
> or failed.
>
> FWIW, similar problems hit kswapd in the past where allocation requests
> could artifically keep it awake consuming 100% of CPU.
>

Understood. Thanks for your explanation again.

Thanks
Yafang