Re: [PATCH RFC 00/12] x86 NUMA-aware kernel replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Artem,

> 
> Preliminary performance evaluation results:
> Processor Intel(R) Xeon(R) CPU E5-2690
> 2 nodes with 12 CPU cores for each one
> 
> fork/1 - Time measurements include only one time of invoking this system call.
>          Measurements are made between entering and exiting the system call.
> 
> fork/1024 - The system call is invoked in  a loop 1024 times.
>             The time between entering a loop and exiting it was measured.
> 
> mmap/munmap - A set of 1024 pages (if PAGE_SIZE is not defined it is equal to 4096)
>               was mapped using mmap syscall and unmapped using munmap one.
>               Every page is mapped/unmapped per a loop iteration.
> 
> mmap/lock - The same as above, but in this case flag MAP_LOCKED was added.
> 
> open/close - The /dev/null pseudo-file was opened and closed in a loop 1024 times.
>              It was opened and closed once per iteration.
> 
> mount - The pseudo-filesystem procFS was mounted to a temporary directory inside /tmp only one time.
>         The time between entering and exiting the system call was measured.
> 
> kill - A signal handler for SIGUSR1 was setup. Signal was sent to a child process,
>        which was created using fork glibc's wrapper. Time between sending and receiving
>        SIGUSR1 signal was measured.
> 
> Hot caches:
> 
> fork-1          2.3%
> fork-1024       10.8%
> mmap/munmap     0.4%
> mmap/lock       4.2%
> open/close      3.2%
> kill            4%
> mount           8.7%
> 
> Cold caches:
> 
> fork-1          42.7%
> fork-1024       17.1%
> mmap/munmap     0.4%
> mmap/lock       1.5%
> open/close      0.4%
> kill            26.1%
> mount           4.1%
> 


I've conducted some testing on AMD EPYC 7713 64-Core processor (dual socket, 2 NUMA nodes, 64 CPUs on each node) to evaluate the performance with this patchset.
I've implemented the syscall based testcases as suggested in your previous mail. I'm shielding the 2nd NUMA node using isolcpus and nohz_full, and executing the tests on cpus belonging to this node.

Performance Evaluation results (% gain over base kernel 6.5.0-rc5):

Hot caches:
fork-1		1.1%
fork-1024	-3.8%
mmap/munmap	-1.5%
mmap/lock	-4.7%
open/close	-6.8%
kill		3.3%
mount		-13.0%

Cold caches:
fork-1		1.2%
fork-1024 	-7.2%
mmap/munmap 	-1.6%
mmap/lock 	-1.0%
open/close 	4.6%
kill 		-54.2%
mount 		-8.5%

Thanks,
Shivank





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux