On Thu, 15 Feb 2024 18:16:48 -0500 Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > On Thu, 15 Feb 2024 18:07:42 -0500 > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > text data bss dec hex filename > > 29161847 18352730 5619716 53134293 32ac3d5 vmlinux.orig > > 29162286 18382638 5595140 53140064 32ada60 vmlinux.memtag-off (+5771) > > 29230868 18887662 5275652 53394182 32ebb06 vmlinux.memtag (+259889) > > 29230746 18887662 5275652 53394060 32eba8c vmlinux.memtag-default-on (+259767) dropped? > > 29276214 18946374 5177348 53399936 32ed180 vmlinux.memtag-debug (+265643) > > If you plan on running this in production, and this increases the size of > the text by 68k, have you measured the I$ pressure that this may induce? > That is, what is the full overhead of having this enabled, as it could > cause more instruction cache misses? > > I wonder if there has been measurements of it off. That is, having this > configured in but default off still increases the text size by 68k. That > can't be good on the instruction cache. > I should have read the cover letter ;-) (someone pointed me to that on IRC): > Performance overhead: > To evaluate performance we implemented an in-kernel test executing > multiple get_free_page/free_page and kmalloc/kfree calls with allocation > sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU > affinity set to a specific CPU to minimize the noise. Below are results > from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on > 56 core Intel Xeon: These are micro benchmarks, were any larger benchmarks taken? As microbenchmarks do not always show I$ issues (because the benchmark itself will warm up the cache). The cache issue could slow down tasks at a bigger picture, as it can cause more cache misses. Running other benchmarks under perf and recording the cache misses between the different configs would be a good picture to show. > > kmalloc pgalloc > (1 baseline) 6.764s 16.902s > (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) > (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) > (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) > (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) > > Memory overhead: > Kernel size: > > text data bss dec diff > (1) 26515311 18890222 17018880 62424413 > (2) 26524728 19423818 16740352 62688898 264485 > (3) 26524724 19423818 16740352 62688894 264481 > (4) 26524728 19423818 16740352 62688898 264485 > (5) 26541782 18964374 16957440 62463596 39183 Similar to my builds. > > Memory consumption on a 56 core Intel CPU with 125GB of memory: > Code tags: 192 kB > PageExts: 262144 kB (256MB) > SlabExts: 9876 kB (9.6MB) > PcpuExts: 512 kB (0.5MB) > > Total overhead is 0.2% of total memory. All this, and we are still worried about 4k for useful debugging :-/ -- Steve