On Fri, Jun 14, 2019 at 04:13:31PM +0200, Thomas Gleixner wrote: > On Wed, 12 Jun 2019, Dmitry Safonov wrote: > > > From: Andrei Vagin <avagin@xxxxxxxxx> > > > > After performance testing VDSO patches a noticeable 20% regression was > > found on gettime_perf selftest with a cold cache. > > As it turns to be, before time namespaces introduction, VDSO functions > > were quite aligned to cache lines, but adding a new code to adjust > > timens offset inside namespace created a small shift and vdso functions > > become unaligned on cache lines. > > > > Add align to vdso functions with gcc option to fix performance drop. > > > > Coping the resulting numbers from cover letter: > > > > Hot CPU cache (more gettime_perf.c cycles - the better): > > | before | CONFIG_TIME_NS=n | host | inside timens > > --------|------------|------------------|-------------|------------- > > cycles | 139887013 | 139453003 | 139899785 | 128792458 > > diff (%)| 100 | 99.7 | 100 | 92 > > Why is CONFIG_TIME_NS=n behaving worse than current mainline and > worse than 'host' mode? We had to specify a precision of these numbers, it is more than this 0.3%, so at that time I decided that here is nothing to worry about. I did these measurments a few mounth ago for the second version of this series. I repeated measurments for this set of patches: | before | CONFIG_TIME_NS=n | host | inside timens -------------------------------------------------------------- | 144645498 | 142916801 | 140364862 | 132378440 | 143440633 | 141545739 | 140540053 | 132714190 | 144876395 | 144650599 | 140026814 | 131843318 | 143984551 | 144595770 | 140359260 | 131683544 | 144875682 | 143799788 | 140692618 | 131300332 -------------------------------------------------------------- avg | 144364551 | 143501739 | 140396721 | 131983964 diff % | 100 | 99.4 | 97.2 | 91.4 ------------------------------------------------------------- stdev % | 0.4 | 0.9 | 0.1 | 0.4 > > > Cold cache (lesser tsc per gettime_perf_cold.c cycle - the better): > > | before | CONFIG_TIME_NS=n | host | inside timens > > --------|------------|------------------|-------------|------------- > > tsc | 6748 | 6718 | 6862 | 12682 > > diff (%)| 100 | 99.6 | 101.7 | 188 > > Weird, now CONFIG_TIME_NS=n is better than current mainline and 'host' mode > drops. The precision of these numbers is much smaller than of the previous set. These numbers are for the second version of this series, so I decided to repeat measurements for this version. When I run the test, I found that there is some degradation in compare with v5.0. I bisected and found that the problem is in 2b539aefe9e4 ("mm/resource: Let walk_system_ram_range() search child resources"). At this point, I realized that my test isn't quite right. On each iteration, the test starts a new process, then do start=rdtsc();clock_gettime();end=rdtsc() and prints (end-start). The problem here is that when clock_gettime() is called the first time, vdso pages are not mapped into a process address space, so the test measures how fast vdso pages are mapped into the process address space. I modified this test, now it uses the clflush instruction to drop cpu caches. Here are the results: | before | CONFIG_TIME_NS=n | host | inside timens -------------------------------------------------------------- tsc | 434 | 433 | 437 | 477 stdev(tsc) | 5 | 5 | 5 | 3 diff (%) | 1 | 1 | 100.1 | 109 Here is the source code for the modified test: https://github.com/avagin/linux-task-diag/blob/wip/timens-rfc-v4/tools/testing/selftests/timens/gettime_perf_cold.c This test does 10K iterations. At the first glance, the numbers look noisy, so I sort them and take only 8K numbers in the middle: $ ./gettime_perf_cold > raw $ cat raw | sort -n | tail -n 9000 | head -n 8000 > results > > Either I'm misreading the numbers or missing something or I'm just confused > as usual :) > > Thanks, > > tglx _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers