Re: Analysis of the overhead of frame pointers on gcc compiles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/20/23 13:23, Richard W.M. Jones wrote:
Today I've read (twice) that the overhead of frame pointers on the
runtime of the compiler, GCC, is 10%.  This number is nonsense.  The
actual overhead is 1%, and I have done the tests that show this.

Both the 1% and the 10% results can be valid.  In particular, I have seen
variance of up to 15% in CPU time for consecutive runs of the same CPU-
saturating task on the SAME physical machine, due to the lack in Linux of
cache coloring considerations when allocating physical page frames for
virtual memory, and the resulting random affects on the performance
of the data cache.  See  https://en.wikipedia.org/wiki/Cache_coloring :
A virtual memory subsystem that lacks cache coloring is less deterministic
with regards to cache performance, as differences in page allocation
from one program run to the next can lead to large differences in
program performance

Page coloring is employed in operating systems such as Solaris, FreeBSD,
NetBSD and Windows NT.
[Note the conspicuous absence of Linux from that list.]

Other sources of real-time contention should be considered, too.
Queuing delays in a file system due to encryption, journaling, block
allocation and placement, etc., might mask real-time measurement of CPU+cache.
Any potentially-competing activity such as graphical desktop environment,
use of network or video or audio, or crontab or tasks controlled by systemd,
should be minimized.  It may be best to measure when the machine
has been booted to single-user mode.

Because of the impact of data cache performance, it is important to
state the CPU, RAM, and cache characteristics when measuring performance.
Such as: the beginning of /proc/cpuinfo:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 94
model name	: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
stepping	: 3
microcode	: 0xf0
cpu MHz		: 3418.725
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
On Intel x86_64 Core CPUs with hyperthreading, then the two threads
per core compete for the 256KiB L2 cache per core.

On x86_64, then the CPUID instruction reports cache organization,
which can be interpreted, such as:
 22 GenuineIntel

TLB/Cache:  eax=76036301  ebx=00f0b6ff  ecx=00000000  edx=00c30000
  1  repeat for more info
63 03 dTLB: 4 KByte pages, 4-way, 64 entries
 76 iTLB: 2M/4M pages, fully associative, 8 entries
 ff Use CPUID leaf 4
b6 f0 64-byte prefetching c3
Cache:  eax=1c004121  ebx=01c0003f  ecx=0000003f  edx=00000000
  1 Data Cache
  1 Cache Level (starts at 1)
  1 Self-initializing
  0 Fully associative
  2 max # logical processors
  8 max # physical cores
 64 system coherency line size
  1 physical line partitions
  8 ways of associativity
 64 number of sets
32768 total size
  0 WBINVD/INVD acts on this level only
  0 cache includes lower levels
  0 complex cache indexing

Cache:  eax=1c004122  ebx=01c0003f  ecx=0000003f  edx=00000000
  2 Instruction Cache
  1 Cache Level (starts at 1)
  1 Self-initializing
  0 Fully associative
  2 max # logical processors
  8 max # physical cores
 64 system coherency line size
  1 physical line partitions
  8 ways of associativity
 64 number of sets
32768 total size
  0 WBINVD/INVD acts on this level only
  0 cache includes lower levels
  0 complex cache indexing

Cache:  eax=1c004143  ebx=00c0003f  ecx=000003ff  edx=00000000
  3 Unified Cache
  2 Cache Level (starts at 1)
  1 Self-initializing
  0 Fully associative
  2 max # logical processors
  8 max # physical cores
 64 system coherency line size
  1 physical line partitions
  4 ways of associativity
1024 number of sets
262144 total size
  0 WBINVD/INVD acts on this level only
  0 cache includes lower levels
  0 complex cache indexing

Cache:  eax=1c03c163  ebx=02c0003f  ecx=00001fff  edx=00000006
  3 Unified Cache
  3 Cache Level (starts at 1)
  1 Self-initializing
  0 Fully associative
 16 max # logical processors
  8 max # physical cores
 64 system coherency line size
  1 physical line partitions
 12 ways of associativity
8192 number of sets
6291456 total size
  0 WBINVD/INVD acts on this level only
  1 cache includes lower levels
  1 complex cache indexing

Cache:  eax=00000000  ebx=00000000  ecx=00000000  edx=00000000
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux