On 02/27/2015 11:16 PM, David Rientjes wrote:
NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.
GFP_THISNODE is a secret combination of gfp bits that have different
behavior than expected. It is a combination of __GFP_THISNODE,
__GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page allocator
slowpath to fail without trying reclaim even though it may be used in
combination with __GFP_WAIT.
An example of the problem this creates: commit e97ca8e5b864 ("mm: fix
GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
that really just wanted __GFP_THISNODE. The problem doesn't end there,
however, because even it was a no-op for alloc_misplaced_dst_page(),
which also sets __GFP_NORETRY and __GFP_NOWARN, and
migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
is set in GFP_TRANSHUGE. Converting GFP_THISNODE to __GFP_THISNODE is
a no-op in these cases since the page allocator special-cases
__GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.
It's time to just remove GFP_THISNODE entirely. We leave __GFP_THISNODE
to restrict an allocation to a local node, but remove GFP_THISNODE and
its obscurity. Instead, we require that a caller clear __GFP_WAIT if it
wants to avoid reclaim.
This allows the aforementioned functions to actually reclaim as they
should. It also enables any future callers that want to do
__GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim. The
rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.
Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
it is unchanged.
Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
So you've convinced me that this is a non-functional change for slab and
a prerequisity for patch 2/3 which is the main point of this series for
4.0. That said, the new 'goto nopage' condition is still triggered by a
combination of flag states, and the less we have those, the better for
us IMHO.
Looking at commit 952f3b51be which introduced this particular check
using GFP_THISNODE, it seemed like it was a workaround to avoid
triggering reclaim, without needing a new gfp flag. Nowadays, we have
such flag called __GFP_NO_KSWAPD and as I explained in my reply to v1
(where I missed the new condition), passing the flag would already
prevent wake_all_kswapds() and treating the allocation as atomic in
gfp_to_alloc_flags(). So the whole difference would be another
get_page_from_freelist() attempt (possibly less constrained than the
fast path one) and then bail out on !wait.
So it would be IMHO better for longer-term maintainability to have the
relevant __GFP_THISNODE callers pass also __GFP_NO_KSWAPD to denote
these opportunistic allocation attempts, instead of having this subtle
semantic difference attached to __GFP_THISNODE && !__GFP_WAIT. It would
be probably too risky for 4.0. On the other hand, I don't think even
this series is really urgent. It's true that patch 2/3 fixes two
commits, including a 4.0 one, but those commits are already not
regressions without the fix. But maybe current -rcX is low enough to
proceed with this series and catch any regressions in time, allowing the
larger cleanups later.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>