On 9/26/18 4:22 PM, Michal Hocko wrote: > On Wed 26-09-18 16:17:08, Michal Hocko wrote: >> On Wed 26-09-18 16:30:39, Kirill A. Shutemov wrote: >>> On Tue, Sep 25, 2018 at 02:03:26PM +0200, Michal Hocko wrote: >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index c3bc7e9c9a2a..c0bcede31930 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -629,21 +629,40 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, >>>> * available >>>> * never: never stall for any thp allocation >>>> */ >>>> -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) >>>> +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr) >>>> { >>>> const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); >>>> + gfp_t this_node = 0; >>>> + >>>> +#ifdef CONFIG_NUMA >>>> + struct mempolicy *pol; >>>> + /* >>>> + * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not >>>> + * specified, to express a general desire to stay on the current >>>> + * node for optimistic allocation attempts. If the defrag mode >>>> + * and/or madvise hint requires the direct reclaim then we prefer >>>> + * to fallback to other node rather than node reclaim because that >>>> + * can lead to excessive reclaim even though there is free memory >>>> + * on other nodes. We expect that NUMA preferences are specified >>>> + * by memory policies. >>>> + */ >>>> + pol = get_vma_policy(vma, addr); >>>> + if (pol->mode != MPOL_BIND) >>>> + this_node = __GFP_THISNODE; >>>> + mpol_cond_put(pol); >>>> +#endif >>> >>> I'm not very good with NUMA policies. Could you explain in more details how >>> the code above is equivalent to the code below? >> >> MPOL_PREFERRED is handled by policy_node() before we call __alloc_pages_nodemask. >> __GFP_THISNODE is applied only when we are not using >> __GFP_DIRECT_RECLAIM which is handled in alloc_hugepage_direct_gfpmask >> now. >> Lastly MPOL_BIND wasn't handled explicitly but in the end the removed >> late check would remove __GFP_THISNODE for it as well. So in the end we >> are doing the same thing unless I miss something > > Forgot to add. One notable exception would be that the previous code > would allow to hit > WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE)); > in policy_node if the requested node (e.g. cpu local one) was outside of > the mbind nodemask. This is not possible now. We haven't heard about any > such warning yet so it is unlikely that it happens though. I don't think the previous code could hit the warning, as the hugepage path that would add __GFP_THISNODE didn't call policy_node() (containing the warning) at all. IIRC early of your patch did hit the warning though, which is why you added the MPOL_BIND policy check.