On Fri, Jul 26, 2019 at 06:06:48PM +0800, Yafang Shao wrote: > On Fri, Jul 26, 2019 at 3:09 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > > > On Thu, Jul 25, 2019 at 09:50:21AM -0400, Yafang Shao wrote: > > > By now there're three compaction paths, > > > - direct compaction > > > - kcompactd compcation > > > - proc triggered compaction > > > When we do compaction in all these paths, we will use compaction_suitable() > > > to check whether a zone is suitable to do compaction. > > > > > > There're some issues around the usage of compaction_suitable(). > > > We don't use the proper zoneid in kcompactd_node_suitable() when try to > > > wakeup kcompactd. In the kcompactd compaction paths, we call > > > compaction_suitable() twice and the zoneid isn't proper in the second call. > > > For proc triggered compaction, the classzone_idx is always zero. > > > > > > In order to fix these issues, I change the type of classzone_idx in the > > > struct compact_control from const int to int and assign the proper zoneid > > > before calling compact_zone(). > > > > > > > What is actually fixed by this? > > > > Recently there's a page alloc failure on our server because the > compaction can't satisfy it. That could be for a wide variety of reasons. There are limits to how aggressive compaction will be but if there are unmovable pages preventing the allocation, no amount of cleverness with the wakeups will change that. > This issue is unproducible, so I have to view the compaction code and > find out the possible solutions. For high allocation success rates, the focus should be on strictness of fragmentation control (hard, multiple tradeoffs) or increasing the number of pages that can be moved (very hard, multiple tradeoffs). > When I'm reading these compaction code, I find some misuse of > compaction_suitable(). > But after you point out, I find that I missed something. > The classzone_idx should represent the alloc request, otherwise we may > do unnecessary compaction on a zone. > Thanks a lot for your explaination. > Exactly. > Hi Andrew, > > Pls. help drop this patch. Sorry about that. Agreed but there is no need to apologise. The full picture of this problem is not obvious, not described anywhere and it's extremely difficult to test and verify. > > > <SNIP> > > > @@ -2535,7 +2535,7 @@ static void kcompactd_do_work(pg_data_t *pgdat) > > > cc.classzone_idx); > > > count_compact_event(KCOMPACTD_WAKE); > > > > > > - for (zoneid = 0; zoneid <= cc.classzone_idx; zoneid++) { > > > + for (zoneid = 0; zoneid <= pgdat->kcompactd_classzone_idx; zoneid++) { > > > int status; > > > > > > zone = &pgdat->node_zones[zoneid]; > > > > This variable can be updated by a wakeup while the loop is executing > > making the loop more difficult to reason about given the exit conditions > > can change. > > > > Thanks for your point out. > > But seems there're still issues event without my change ? > For example, > If we call wakeup_kcompactd() while the kcompactd is running, > we just modify the kcompactd_max_order and kcompactd_classzone_idx and > then return. > Then in another path, the wakeup_kcompactd() is called again, > so kcompactd_classzone_idx and kcompactd_max_order will be override, > that means the previous wakeup is missed. > Right ? > That's intended. When kcompactd wakes up, it takes a snapshot of what is requested and works on that. Other requests can update the requirements for a future compaction request if necessary. One could argue that the wakeup is missed but really it's "defer that request to some kcompactd activity in the future". If kcompactd loops until there are no more requests, it can consume an excessive amount of CPU due to requests continually keeping it awake. kcompactd is best-effort to reduce the amount of direct stalls due to compaction but an allocation request always faces the possibility that it may stall because a kernel thread has not made enough progress or failed. FWIW, similar problems hit kswapd in the past where allocation requests could artifically keep it awake consuming 100% of CPU. -- Mel Gorman SUSE Labs