[merged] mm-page_alloc-make-sure-__rmqueue-etc-always-inline.patch removed from -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm/page_alloc: make sure __rmqueue() etc are always inline
has been removed from the -mm tree.  Its filename was
     mm-page_alloc-make-sure-__rmqueue-etc-always-inline.patch

This patch was dropped because it was merged into mainline or a subsystem tree

------------------------------------------------------
From: Aaron Lu <aaron.lu@xxxxxxxxx>
Subject: mm/page_alloc: make sure __rmqueue() etc are always inline

__rmqueue(), __rmqueue_fallback(), __rmqueue_smallest() and
__rmqueue_cma_fallback() are all in page allocator's hot path and better
be finished as soon as possible.  One way to make them faster is by making
them inline.  But as Andrew Morton and Andi Kleen pointed out:
https://lkml.org/lkml/2017/10/10/1252
https://lkml.org/lkml/2017/10/10/1279
To make sure they are inlined, we should use __always_inline for them.

With the will-it-scale/page_fault1/process benchmark, when using nr_cpu
processes to stress buddy, the results for will-it-scale.processes with
and without the patch are:

On a 2-sockets Intel-Skylake machine:

 compiler          base        head
gcc-4.4.7       6496131     6911823 +6.4%
gcc-4.9.4       7225110     7731072 +7.0%
gcc-5.4.1       7054224     7688146 +9.0%
gcc-6.2.0       7059794     7651675 +8.4%

On a 4-sockets Intel-Skylake machine:

 compiler          base        head
gcc-4.4.7      13162890    13508193 +2.6%
gcc-4.9.4      14997463    15484353 +3.2%
gcc-5.4.1      14708711    15449805 +5.0%
gcc-6.2.0      14574099    15349204 +5.3%

The above 4 compilers are used because I've done the tests through Intel's
Linux Kernel Performance(LKP) infrastructure and they are the available
compilers there.

The benefit being less on 4 sockets machine is due to the lock contention
there(perf-profile/native_queued_spin_lock_slowpath=81%) is less severe
than on the 2 sockets machine(85%).

What the benchmark does is: it forks nr_cpu processes and then each
process does the following:
    1 mmap() 128M anonymous space;
    2 writes to each page there to trigger actual page allocation;
    3 munmap() it.
in a loop.
https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c

Binary size wise, I have locally built them with different compilers:

[aaron@aaronlu obj]$ size */*/mm/page_alloc.o
   text    data     bss     dec     hex filename
  37409    9904    8524   55837    da1d gcc-4.9.4/base/mm/page_alloc.o
  38273    9904    8524   56701    dd7d gcc-4.9.4/head/mm/page_alloc.o
  37465    9840    8428   55733    d9b5 gcc-5.5.0/base/mm/page_alloc.o
  38169    9840    8428   56437    dc75 gcc-5.5.0/head/mm/page_alloc.o
  37573    9840    8428   55841    da21 gcc-6.4.0/base/mm/page_alloc.o
  38261    9840    8428   56529    dcd1 gcc-6.4.0/head/mm/page_alloc.o
  36863    9840    8428   55131    d75b gcc-7.2.0/base/mm/page_alloc.o
  37711    9840    8428   55979    daab gcc-7.2.0/head/mm/page_alloc.o

Text size increased about 800 bytes for mm/page_alloc.o.

[aaron@aaronlu obj]$ size */*/vmlinux
   text    data     bss     dec       hex     filename
10342757   5903208 17723392 33969357  20654cd gcc-4.9.4/base/vmlinux
10342757   5903208 17723392 33969357  20654cd gcc-4.9.4/head/vmlinux
10332448   5836608 17715200 33884256  2050860 gcc-5.5.0/base/vmlinux
10332448   5836608 17715200 33884256  2050860 gcc-5.5.0/head/vmlinux
10094546   5836696 17715200 33646442  201676a gcc-6.4.0/base/vmlinux
10094546   5836696 17715200 33646442  201676a gcc-6.4.0/head/vmlinux
10018775   5828732 17715200 33562707  2002053 gcc-7.2.0/base/vmlinux
10018775   5828732 17715200 33562707  2002053 gcc-7.2.0/head/vmlinux

Text size for vmlinux has no change though, probably due to function
alignment.

Link: http://lkml.kernel.org/r/20171013063111.GA26032@xxxxxxxxx
Signed-off-by: Aaron Lu <aaron.lu@xxxxxxxxx>
Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: Andi Kleen <andi@xxxxxxxxxxxxxx>
Cc: Huang Ying <ying.huang@xxxxxxxxx>
Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
Cc: Kemi Wang <kemi.wang@xxxxxxxxx>
Cc: Anshuman Khandual <khandual@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/page_alloc.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff -puN mm/page_alloc.c~mm-page_alloc-make-sure-__rmqueue-etc-always-inline mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-make-sure-__rmqueue-etc-always-inline
+++ a/mm/page_alloc.c
@@ -1813,7 +1813,7 @@ static void prep_new_page(struct page *p
  * Go through the free lists for the given migratetype and remove
  * the smallest available page from the freelists
  */
-static inline
+static __always_inline
 struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 						int migratetype)
 {
@@ -1857,7 +1857,7 @@ static int fallbacks[MIGRATE_TYPES][4] =
 };
 
 #ifdef CONFIG_CMA
-static struct page *__rmqueue_cma_fallback(struct zone *zone,
+static __always_inline struct page *__rmqueue_cma_fallback(struct zone *zone,
 					unsigned int order)
 {
 	return __rmqueue_smallest(zone, order, MIGRATE_CMA);
@@ -2238,7 +2238,7 @@ static bool unreserve_highatomic_pageblo
  * deviation from the rest of this file, to make the for loop
  * condition simpler.
  */
-static inline bool
+static __always_inline bool
 __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 {
 	struct free_area *area;
@@ -2310,8 +2310,8 @@ do_steal:
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
  */
-static struct page *__rmqueue(struct zone *zone, unsigned int order,
-				int migratetype)
+static __always_inline struct page *
+__rmqueue(struct zone *zone, unsigned int order, int migratetype)
 {
 	struct page *page;
 
_

Patches currently in -mm which might be from aaron.lu@xxxxxxxxx are


--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux