On Thu, Sep 2, 2021 at 5:31 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Wed, Sep 1, 2021 at 10:24 AM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > But what you could do, if you wanted to, would be to catch the > > situation where you have lots of expensive NUMA accesses either using > > our VM infrastructure or performance counters, and when the mapping is > > a MAP_PRIVATE you just do a COW fault on them. > > > > Sounds entirely doable, and has absolutely nothing to do with the page > > cache. It would literally just be an "over-eager COW fault triggered > > by NUMA access counters". > > Note how it would work perfectly fine for anonymous mappings too. Just > to reinforce the point that this has nothing to do with any page cache > issues. > > Of course, if you want to actually then *share* pages within a node > (rather than replicate them for each process), that gets more > exciting. > > But I suspect that this is mainly only useful for long-running big > processes (not least due to that node binding thing), so I question > the need for that kind of excitement. In Linux server scenarios, it would be quite common to have long-running big processes constantly running on one machine, for example, web, database etc. This kind of process can cross a couple of NUMA nodes using all CPUs in a server to achieve the maximum throughput. SGI/HPE has a numatool with command "dplace" to help deploy processes with replicated text in either libraries or binary (a.out) [1]: dplace [-e] [-c cpu_numbers] [-s skip_count] [-n process_name] \ [-x skip_mask] [-r [l|b|t]] [-o log_file] [-v 1|2] \ command [command-args] The dplace command accepts the following options: ... -r: Specifies that text should be replicated on the node or nodes where the application is running. In some cases, replication will improve performance by reducing the need to make offnode memory references for code. The replication option applies to all programs placed by the dplace command. See the dplace man page for additional information on text replication. The replication options are a string of one or more of the following characters: l - Replicate library text b - Replicate binary (a.out) text t - Thread round-robin option On the other hand, it would be also interesting to investigate if kernel text replication can help improve performance. MIPS does have REPLICATE_KTEXT support in the kernel: config REPLICATE_KTEXT bool "Kernel text replication support" depends on SGI_IP27 select MAPPED_KERNEL help Say Y here to enable replicating the kernel text across multiple nodes in a NUMA cluster. This trades memory for speed. Not quite sure how it will benefit X86 and ARM64 though it seems concurrent-rt has some solution and benchmark data in RedHawk Linux[2]. [1] http://www.nacad.ufrj.br/online/sgi/007-5646-002/sgi_html/ch05.html [2] https://www.concurrent-rt.com/wp-content/uploads/2016/11/kernel-page-replication.pdf > > Linus Thanks Barry