On 28.10.2014 16:39, Rik van Riel wrote:
On 10/28/2014 08:58 AM, Rik van Riel wrote:
On 10/28/2014 08:12 AM, Andi Kleen wrote:
Alex Thorlton <athorlton@xxxxxxx> writes:
Last week, while discussing possible fixes for some
unexpected/unwanted behavior
from khugepaged (see: https://lkml.org/lkml/2014/10/8/515) several
people
mentioned possibly changing changing khugepaged to work as a
task_work function
instead of a kernel thread. This will give us finer grained
control over the
page collapse scans, eliminate some unnecessary scans since tasks
that are
relatively inactive will not be scanned often, and eliminate the
unwanted
behavior described in the email thread I mentioned.
With your change, what would happen in a single threaded case?
Previously one core would scan and another would run the workload.
With your change both scanning and running would be on the same
core.
Would seem like a step backwards to me.
It's not just scanning, either.
Memory compaction can spend a lot of time waiting on
locks. Not consuming CPU or anything, but just waiting.
I am not convinced that moving all that waiting to task
context is a good idea.
It may be worth investigating how the hugepage code calls
the memory allocation & compaction code.
It's actually quite stupid, AFAIK. it will scan for collapse candidates,
and only then
try to allocate THP, which may involve compaction. If that fails, the
scanning time was
wasted.
What could help would be to cache one or few free huge pages per zone
with cache
re-fill done asynchronously, e.g. via work queues. The cache could
benefit fault-THP
allocations as well. And adding some logic that if nobody uses the
cached pages and
memory is low, then free them. And importantly, if it's not possible to
allocate huge
pages for the cache, then prevent scanning for collapse candidates as
there's no point.
(well this is probably more complex if some nodes can allocate huge
pages and others
not).
For the scanning itself, I think NUMA balancing does similar thing in
task_work context
already, no?
Doing only async compaction from task_work context should
probably be ok.
I'm afraid that if we give up sync compaction here, then there will be
no more left to
defragment MIGRATE_UNMOVABLE pageblocks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>