Hi Michal,
I have considered your proposals:
1. Making memset(0) unconditional inside __init_single_page() is not
going to work because it slows down SPARC, and ppc64. On SPARC even the
BSTI optimization that I have proposed earlier won't work, because after
consulting with other engineers I was told that stores (without loads!)
after BSTI without membar are unsafe
2. Adding ARCH_WANT_LARGE_PAGEBLOCK_INIT is not going to solve the
problem, because while arch might want a large memset(), it still wants
to get the benefit of parallelized struct page initialization.
3. Another approach that have I considered is moving memset() above
__init_single_page() and do it in a larger chunks. However, this
solution is also not going to work, because inside the loops, there are
cases where "struct page"s are skipped, so every single page is checked:
early_pfn_valid(pfn), early_pfn_in_nid(), and also mirroed_kernelcore cases.
I wouldn't be so sure about this. If any other platform has a similar
issues with small memset as sparc then the overhead is just papered over
by parallel initialization.
That is true, and that is fine, because parallelization gives an order
of magnitude better improvements compared to trade of slower single
thread performance. Remember, this will happen during boot and memory
hotplug only, and not something that will eat up computing resources
during runtime.
So, at the moment I cannot really find a better solution compared to
what I have proposed: do memset() inside __init_single_page() only when
deferred initialization is enabled.
Pasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>