On Wed 29-05-13 15:05:38, Michal Hocko wrote: > On Mon 27-05-13 19:13:08, Michal Hocko wrote: > [...] > > Nevertheless I have encountered an issue while testing the huge number > > of groups scenario. And the issue is not limitted to only to this > > scenario unfortunately. As memcg iterators use per node-zone-priority > > cache to prevent from over reclaim it might quite easily happen that > > the walk will not visit all groups and will terminate the loop either > > prematurely or skip some groups. An example could be the direct reclaim > > racing with kswapd. This might cause that the loop misses over limit > > groups so no pages are scanned and so we will fall back to all groups > > reclaim. > > And after some more testing and head scratching it turned out that > fallbacks to pass#2 I was seeing are caused by something else. It is > not race between iterators but rather reclaiming from zone DMA which > has troubles to scan anything despite there are pages on LRU and so we > fall back. I have to look into that more but what-ever the issue is it > shouldn't be related to the patch series. Think I know what is going on. get_scan_count sees relatively small amount of pages in the lists (around 2k). This means that get_scan_count will tell us to scan nothing for DEF_PRIORITY (as the DMA32 is usually ~16M) then the DEF_PRIORITY is basically no-op and we have to wait and fall down to a priority which actually let us scan something. Hmm, maybe ignoring soft reclaim for DMA zone would help to reduce one pointless loop over groups. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html