On Fri, Jul 26, 2019 at 6:26 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, Jul 26, 2019 at 06:06:48PM +0800, Yafang Shao wrote: > > On Fri, Jul 26, 2019 at 3:09 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Thu, Jul 25, 2019 at 09:50:21AM -0400, Yafang Shao wrote: > > > > By now there're three compaction paths, > > > > - direct compaction > > > > - kcompactd compcation > > > > - proc triggered compaction > > > > When we do compaction in all these paths, we will use compaction_suitable() > > > > to check whether a zone is suitable to do compaction. > > > > > > > > There're some issues around the usage of compaction_suitable(). > > > > We don't use the proper zoneid in kcompactd_node_suitable() when try to > > > > wakeup kcompactd. In the kcompactd compaction paths, we call > > > > compaction_suitable() twice and the zoneid isn't proper in the second call. > > > > For proc triggered compaction, the classzone_idx is always zero. > > > > > > > > In order to fix these issues, I change the type of classzone_idx in the > > > > struct compact_control from const int to int and assign the proper zoneid > > > > before calling compact_zone(). > > > > > > > > > > What is actually fixed by this? > > > > > > > Recently there's a page alloc failure on our server because the > > compaction can't satisfy it. > > That could be for a wide variety of reasons. There are limits to how > aggressive compaction will be but if there are unmovable pages preventing > the allocation, no amount of cleverness with the wakeups will change that. > Yes, we should know whether it is lack of movable pages or the compaction can't catch up first. I think it would be better if there're some debugging facilities could help us do that. > > This issue is unproducible, so I have to view the compaction code and > > find out the possible solutions. > > For high allocation success rates, the focus should be on strictness of > fragmentation control (hard, multiple tradeoffs) or increasing the number > of pages that can be moved (very hard, multiple tradeoffs). > Agreed, that's a tradeoff. > > When I'm reading these compaction code, I find some misuse of > > compaction_suitable(). > > But after you point out, I find that I missed something. > > The classzone_idx should represent the alloc request, otherwise we may > > do unnecessary compaction on a zone. > > Thanks a lot for your explaination. > > > > Exactly. > > > Hi Andrew, > > > > Pls. help drop this patch. Sorry about that. > > Agreed but there is no need to apologise. The full picture of this problem > is not obvious, not described anywhere and it's extremely difficult to > test and verify. > > > > > <SNIP> > > > > @@ -2535,7 +2535,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) > > > > cc.classzone_idx); > > > > count_compact_event(KCOMPACTD_WAKE); > > > > > > > > - for (zoneid = 0; zoneid <= cc.classzone_idx; zoneid++) { > > > > + for (zoneid = 0; zoneid <= pgdat->kcompactd_classzone_idx; zoneid++) { > > > > int status; > > > > > > > > zone = &pgdat->node_zones[zoneid]; > > > > > > This variable can be updated by a wakeup while the loop is executing > > > making the loop more difficult to reason about given the exit conditions > > > can change. > > > > > > > Thanks for your point out. > > > > But seems there're still issues event without my change ? > > For example, > > If we call wakeup_kcompactd() while the kcompactd is running, > > we just modify the kcompactd_max_order and kcompactd_classzone_idx and > > then return. > > Then in another path, the wakeup_kcompactd() is called again, > > so kcompactd_classzone_idx and kcompactd_max_order will be override, > > that means the previous wakeup is missed. > > Right ? > > > > That's intended. When kcompactd wakes up, it takes a snapshot of what is > requested and works on that. Other requests can update the requirements for > a future compaction request if necessary. One could argue that the wakeup > is missed but really it's "defer that request to some kcompactd activity > in the future". If kcompactd loops until there are no more requests, it > can consume an excessive amount of CPU due to requests continually keeping > it awake. kcompactd is best-effort to reduce the amount of direct stalls > due to compaction but an allocation request always faces the possibility > that it may stall because a kernel thread has not made enough progress > or failed. > > FWIW, similar problems hit kswapd in the past where allocation requests > could artifically keep it awake consuming 100% of CPU. > Understood. Thanks for your explanation again. Thanks Yafang