On 06/16/2014 04:59 PM, Kirill A. Shutemov wrote:
On Mon, Jun 16, 2014 at 11:49:34PM +0300, Kirill A. Shutemov wrote:
On Mon, Jun 16, 2014 at 03:35:48PM -0400, Waiman Long wrote:
In the __split_huge_page_map() function, the check for
page_mapcount(page) is invariant within the for loop. Because of the
fact that the macro is implemented using atomic_read(), the redundant
check cannot be optimized away by the compiler leading to unnecessary
read to the page structure.
And atomic_read() is *not* atomic operation. It's implemented as
dereferencing though cast to volatile, which suppress compiler
optimization, but doesn't affect what CPU can do with the variable.
So I doubt difference will be measurable anywhere.
Because it is treated as an volatile object, the compiler will have to
reread the value of the relevant page structure field in every iteration
of the loop (512 for x86) when pmd_write(*pmd) is true. I saw some
slight improvement (about 2%) of a microbench that I wrote to break up
1000 THPs with 1000 forked processes.
-Longman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>