On Mon, Jul 18, 2011 at 12:20:11PM -0500, Christoph Lameter wrote: > > > On Mon, 18 Jul 2011, Mel Gorman wrote: > > > On Mon, Jul 18, 2011 at 09:56:31AM -0500, Christoph Lameter wrote: > > > On Fri, 15 Jul 2011, Mel Gorman wrote: > > > > > > > Currently the zonelist cache is setup only after the first zone has > > > > been considered and zone_reclaim() has been called. The objective was > > > > to avoid a costly setup but zone_reclaim is itself quite expensive. If > > > > it is failing regularly such as the first eligible zone having mostly > > > > mapped pages, the cost in scanning and allocation stalls is far higher > > > > than the ZLC initialisation step. > > > > > > Would it not be easier to set zlc_active and allowednodes based on the > > > zone having an active ZLC at the start of get_pages()? > > > > > > > What do you mean by a zones active ZLC? zonelists are on a per-node, > > not a per-zone basis (see node_zonelist) so a zone doesn't have an > > active ZLC as such. If the zlc_active is set at the beginning of > > Look at get_page_from_freelist(): It sets > zlc_active = 0 even through the zonelist under consideration may have a > ZLC. zlc_active = 0 can also mean that the function has not bothered to > look for the zlc information of the current zonelist. > Yes. So? It's only necessary if the watermarks are not met. > > get_page_from_freelist(), it implies that we are calling zlc_setup() > > even when the watermarks are met which is unnecessary. > > Ok then that decision to not call zlc_setup() for performance reasons is > what created the problem that you are trying to solve. In case that the > first zones watermarks are okay we can avoid calling zlc_setup(). > The original implementation did not check the ZLC in the first loop at all. It wasn't just about avoiding the cost of setup. I suspect this problem has been there a long time and it's taking this long for bug reports to show up because NUMA machines are being used for generic numa-unaware workloads. > What we do now have is checking for zlc_active in the loop just so that > the first time around we do not call zlc_setup(). > Yes, why incur the cost for the common case? > We may be able to simplify the function by: > > 1. Checking for the special case that the first zone is ok and that we do > not want to call zlc_setup before we get to the loop. > > 2. Do the zlc_setup() before the loop. > > 3. Remove the zlc_setup() code as you did from the loop as well as the > checks for zlc_active. zlc_active becomes not necessary since a zlc > is always available when we go through the loop. > That initial test will involve duplication of things like the cpuset and no watermarks check just to place the zlc_setup() in a different place. I might be missing your point but it seems like the gain would be marginal. Fancy posting a patch? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>