On 1/25/2024 7:30 AM, Garg, Shivank wrote: > Hi Artem, > >> Preliminary performance evaluation results: >> Processor Intel(R) Xeon(R) CPU E5-2690 >> 2 nodes with 12 CPU cores for each one >> >> fork/1 - Time measurements include only one time of invoking this system call. >> Measurements are made between entering and exiting the system call. >> >> fork/1024 - The system call is invoked in a loop 1024 times. >> The time between entering a loop and exiting it was measured. >> >> mmap/munmap - A set of 1024 pages (if PAGE_SIZE is not defined it is equal to 4096) >> was mapped using mmap syscall and unmapped using munmap one. >> Every page is mapped/unmapped per a loop iteration. >> >> mmap/lock - The same as above, but in this case flag MAP_LOCKED was added. >> >> open/close - The /dev/null pseudo-file was opened and closed in a loop 1024 times. >> It was opened and closed once per iteration. >> >> mount - The pseudo-filesystem procFS was mounted to a temporary directory inside /tmp only one time. >> The time between entering and exiting the system call was measured. >> >> kill - A signal handler for SIGUSR1 was setup. Signal was sent to a child process, >> which was created using fork glibc's wrapper. Time between sending and receiving >> SIGUSR1 signal was measured. >> >> Hot caches: >> >> fork-1 2.3% >> fork-1024 10.8% >> mmap/munmap 0.4% >> mmap/lock 4.2% >> open/close 3.2% >> kill 4% >> mount 8.7% >> >> Cold caches: >> >> fork-1 42.7% >> fork-1024 17.1% >> mmap/munmap 0.4% >> mmap/lock 1.5% >> open/close 0.4% >> kill 26.1% >> mount 4.1% >> > I've conducted some testing on AMD EPYC 7713 64-Core processor (dual socket, 2 NUMA nodes, 64 CPUs on each node) to evaluate the performance with this patchset. > I've implemented the syscall based testcases as suggested in your previous mail. I'm shielding the 2nd NUMA node using isolcpus and nohz_full, and executing the tests on cpus belonging to this node. > > Performance Evaluation results (% gain over base kernel 6.5.0-rc5): > > Hot caches: > fork-1 1.1% > fork-1024 -3.8% > mmap/munmap -1.5% > mmap/lock -4.7% > open/close -6.8% > kill 3.3% > mount -13.0% > > Cold caches: > fork-1 1.2% > fork-1024 -7.2% > mmap/munmap -1.6% > mmap/lock -1.0% > open/close 4.6% > kill -54.2% > mount -8.5% > > Thanks, > Shivank > Hi Shivank, thank you for performance evaluation, unfortunately we don't have AMD EPYC right now, I'll try to find a way to perform measurements and clarify why such difference. We currently trying to make performance evaluation using database related benchmarks. Will return with the results after clarification. BR