Re: [PATCH v3 31/35] lib: add memory allocations report in show_mem()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 15 Feb 2024 18:16:48 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Thu, 15 Feb 2024 18:07:42 -0500
> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> 
> >    text         data            bss     dec             hex filename
> > 29161847        18352730        5619716 53134293        32ac3d5 vmlinux.orig
> > 29162286        18382638        5595140 53140064        32ada60 vmlinux.memtag-off		(+5771)
> > 29230868        18887662        5275652 53394182        32ebb06 vmlinux.memtag			(+259889)
> > 29230746        18887662        5275652 53394060        32eba8c vmlinux.memtag-default-on	(+259767) dropped?
> > 29276214        18946374        5177348 53399936        32ed180 vmlinux.memtag-debug		(+265643)  
> 
> If you plan on running this in production, and this increases the size of
> the text by 68k, have you measured the I$ pressure that this may induce?
> That is, what is the full overhead of having this enabled, as it could
> cause more instruction cache misses?
> 
> I wonder if there has been measurements of it off. That is, having this
> configured in but default off still increases the text size by 68k. That
> can't be good on the instruction cache.
> 

I should have read the cover letter ;-)  (someone pointed me to that on IRC):

> Performance overhead:
> To evaluate performance we implemented an in-kernel test executing
> multiple get_free_page/free_page and kmalloc/kfree calls with allocation
> sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
> affinity set to a specific CPU to minimize the noise. Below are results
> from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
> 56 core Intel Xeon:

These are micro benchmarks, were any larger benchmarks taken? As
microbenchmarks do not always show I$ issues (because the benchmark itself
will warm up the cache). The cache issue could slow down tasks at a bigger
picture, as it can cause more cache misses.

Running other benchmarks under perf and recording the cache misses between
the different configs would be a good picture to show.

> 
>                         kmalloc                 pgalloc
> (1 baseline)            6.764s                  16.902s
> (2 default disabled)    6.793s (+0.43%)         17.007s (+0.62%)
> (3 default enabled)     7.197s (+6.40%)         23.666s (+40.02%)
> (4 runtime enabled)     7.405s (+9.48%)         23.901s (+41.41%)
> (5 memcg)               13.388s (+97.94%)       48.460s (+186.71%)


> 
> Memory overhead:
> Kernel size:
> 
>    text           data        bss         dec         diff
> (1) 26515311	      18890222    17018880    62424413
> (2) 26524728	      19423818    16740352    62688898    264485
> (3) 26524724	      19423818    16740352    62688894    264481
> (4) 26524728	      19423818    16740352    62688898    264485
> (5) 26541782	      18964374    16957440    62463596    39183

Similar to my builds.


> 
> Memory consumption on a 56 core Intel CPU with 125GB of memory:
> Code tags:           192 kB
> PageExts:         262144 kB (256MB)
> SlabExts:           9876 kB (9.6MB)
> PcpuExts:            512 kB (0.5MB)
> 
> Total overhead is 0.2% of total memory.


All this, and we are still worried about 4k for useful debugging :-/

-- Steve




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux