On 1/16/20 11:22 AM, Mike Kravetz wrote:
On 1/15/20 11:59 PM, Michal Hocko wrote:
On Wed 15-01-20 13:07:17, Mike Kravetz wrote:
What should we do?
==================
1) Nothing more than optimizations by Li Xinhai. Behavior that could be
seen as conflicting with man page has existed since v3.12 and I am
not aware of any complaints.
2) In addition to optimizations by Li Xinhai, modify code to truly ignore
MPOL_MF_STRICT for huge page mappings. This would be fairly easy to do
after a failure of migrate_pages(). We could simply traverse the list
of pages that were not migrated looking for any non-hugetlb page.
3) Remove the statement "MPOL_MF_STRICT is ignored on huge page mappings."
and modify code accordingly.
My suggestion would be for 1 or 2. Thoughts?
And why do we exactly need to do anything at all? There is an
inconsistency that has been there for years without anybody noticing.
NUMA API is a mess on its own and unfixable at this stage, there will
always be some corner cases. If there is no real workload hitting this
incosistency and suffering, I would rather not touch this at all.
Unless the change would clean up the code or make it more maintainable.
That is a very valid point. Sometimes we as developers get focused on the
actual code changes and fail to ask the question "does this really need to
be changed?" or "what value do the code changes provide?".
Li Xinhai came up with two optimizations in how the mbind code deals with
hugetlb pages. This 'sub-optimal' code has existed for more than 6 years.
Unless I am mistaken, nobody has actually complained or noticed this behavior.
I believe Li Xinhai noticed this inefficient code via code inspection. Of
course, based on what we know today one could write a test program to show
the inefficient behavior. However, no real users have noticed this during
the past 6 years.
The proposed code changes are fairly simple. However, I would not say that
they clean up the code or make it more maintainable. They essentially add
or modify two checks to bail out early for hugetlb vma's if the flag which
is documented to not apply to hugetlb pages (MPOL_MF_STRICT) is specified.
If one is trying to follow the entire mbind code path for hugetlb pages,
these patches will make that easier follow/understand. That is simply
because one can ignore downstream code/functionality.
Based on Michal's criteria above, I now believe the code changes should not
be made. Yes, they are fairly simple. However, even simple changes have
the potential to break something (build breakage with v1 of patch). We should
leave this code as is unless issues are reported by users.
I tend to agree with you. And, according to what Horiguchi explained,
the intention was not ignoring hugetlb mappings when hugetlb migration
was added at the first place. And, I'm supposed all of us agree hugetlb
pages should be not treated specially although it is not a good timing
or there is not strong motivation to fix it right now (we may correct
the behavior in the future). The patch may convey the wrong information.
And, the code path is definitely not a hot path, so I'm fine to drop it.
And, I'm wondering if we need add some comments in the code to explain
the edge case just in case someone else repeat all the tedious history
digging.