Re: [RFC 00/14] Dynamic Kernel Stacks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/11/24 09:46, Pasha Tatashin wrote:
This is follow-up to the LSF/MM proposal [1]. Please provide your
thoughts and comments about dynamic kernel stacks feature. This is a WIP
has not been tested beside booting on some machines, and running LKDTM
thread exhaust tests. The series also lacks selftests, and
documentations.

This feature allows to grow kernel stack dynamically, from 4KiB and up
to the THREAD_SIZE. The intend is to save memory on fleet machines. From
the initial experiments it shows to save on average 70-75% of the kernel
stack memory.

The average depth of a kernel thread depends on the workload, profiling,
virtualization, compiler optimizations, and driver implementations.
However, the table below shows the amount of kernel stack memory before
vs. after on idling freshly booted machines:

CPU           #Cores #Stacks  BASE(kb) Dynamic(kb)   Saving
AMD Genoa        384    5786    92576       23388    74.74%
Intel Skylake    112    3182    50912       12860    74.74%
AMD Rome         128    3401    54416       14784    72.83%
AMD Rome         256    4908    78528       20876    73.42%
Intel Haswell     72    2644    42304       10624    74.89%

Some workloads with that have millions of threads would can benefit
significantly from this feature.


Ok, first of all, talking about "kernel memory" here is misleading. Unless your threads are spending nearly all their time sleeping, the threads will occupy stack and TLS memory in user space as well.

Second, non-dynamic kernel memory is one of the core design decisions in Linux from early on. This means there are lot of deeply embedded assumptions which would have to be untangled.

Linus would, of course, be the real authority on this, but if someone would ask me what the fundamental design philosophies of the Linux kernel are -- the design decisions which make Linux Linux, if you will -- I would say:

	1. Non-dynamic kernel memory
	2. Permanent mapping of physical memory
	3. Kernel API modeled closely after the POSIX API
	   (no complicated user space layers)
	4. Fast system call entry/exit (a necessity for a
	   kernel API based on simple system calls)
	5. Monolithic (but modular) kernel environment
	   (not cross-privilege, coroutine or message passing)

Third, *IF* this is something that should be done (and I personally strongly suspect it should not), at least on x86-64 it probably should be for FRED hardware only. With FRED, it is possible to set the #PF event stack level to 1, which will cause an automatic stack switch for #PF in kernel space (only). However, even in kernel space, #PF can sleep if it references a user space page, in which case it would have to be demoted back onto the ring 0 stack (there are multiple ways of doing that, but it does entail an overhead.)

	-hpa




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux