+ mm-large-system-hash-clear-hashdist-when-only-one-node-with-memory-is-booted.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm/large system hash: clear hashdist when only one node with memory is booted
has been added to the -mm tree.  Its filename is
     mm-large-system-hash-clear-hashdist-when-only-one-node-with-memory-is-booted.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-large-system-hash-clear-hashdist-when-only-one-node-with-memory-is-booted.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-large-system-hash-clear-hashdist-when-only-one-node-with-memory-is-booted.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Nicholas Piggin <npiggin@xxxxxxxxx>
Subject: mm/large system hash: clear hashdist when only one node with memory is booted

CONFIG_NUMA on 64-bit CPUs currently enables hashdist unconditionally even
when booting on single node machines.  This causes the large system hashes
to be allocated with vmalloc, and mapped with small pages.

This change clears hashdist if only one node has come up with memory.

This results in the important large inode and dentry hashes using memblock
allocations.  All others are within 4MB size up to about 128GB of RAM,
which allows them to be allocated from the linear map on most non-NUMA
images.

Other big hashes like futex and TCP should eventually be moved over to the
same style of allocation as those vfs caches that use HASH_EARLY if
!hashdist, so they don't exceed MAX_ORDER on very large non-NUMA images.

This brings dTLB misses for linux kernel tree `git diff` from ~45,000 to
~8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
(performance is in the noise, under 1% difference, page tables are likely
to be well cached for this workload).

Link: http://lkml.kernel.org/r/20190605144814.29319-2-npiggin@xxxxxxxxx
Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx>
Reviewed-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/page_alloc.c |   31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

--- a/mm/page_alloc.c~mm-large-system-hash-clear-hashdist-when-only-one-node-with-memory-is-booted
+++ a/mm/page_alloc.c
@@ -7535,10 +7535,28 @@ static int page_alloc_cpu_dead(unsigned
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
+int hashdist = HASHDIST_DEFAULT;
+
+static int __init set_hashdist(char *str)
+{
+	if (!str)
+		return 0;
+	hashdist = simple_strtoul(str, &str, 0);
+	return 1;
+}
+__setup("hashdist=", set_hashdist);
+#endif
+
 void __init page_alloc_init(void)
 {
 	int ret;
 
+#ifdef CONFIG_NUMA
+	if (num_node_state(N_MEMORY) == 1)
+		hashdist = 0;
+#endif
+
 	ret = cpuhp_setup_state_nocalls(CPUHP_PAGE_ALLOC_DEAD,
 					"mm/page_alloc:dead", NULL,
 					page_alloc_cpu_dead);
@@ -7923,19 +7941,6 @@ out:
 	return ret;
 }
 
-#ifdef CONFIG_NUMA
-int hashdist = HASHDIST_DEFAULT;
-
-static int __init set_hashdist(char *str)
-{
-	if (!str)
-		return 0;
-	hashdist = simple_strtoul(str, &str, 0);
-	return 1;
-}
-__setup("hashdist=", set_hashdist);
-#endif
-
 #ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES
 /*
  * Returns the number of pages that arch has reserved but
_

Patches currently in -mm which might be from npiggin@xxxxxxxxx are

mm-large-system-hash-use-vmalloc-for-size-max_order-when-hashdist.patch
mm-large-system-hash-clear-hashdist-when-only-one-node-with-memory-is-booted.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux