[LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks

Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> · Thu, 22 Feb 2024 20:03:37 -0500

For a long time, an 8K kernel stack was large enough. However, since
2014, the default stack size has increased to 16K [1]. To conserve
memory at Google, we maintained 8K stacks via a custom patch while
verifying that our workload could fit within this limit.

As we qualify new workloads and kernels, we find it more difficult to
keep the stacks at 8K. Therefore, we will increase the stack size to
the mainline value of 16K. However, this translates to a significant
increase in memory usage, potentially counted in petabytes.

With virtually mapped stacks [2], it's possible to implement
auto-growth on faults. Ideally, the vast majority of kernel threads
could fit into 4K or 8K stacks, with only a small number requiring
deeper stacks that would expand as needed.

The complication is that new pages must always be available from
within an interrupt context. To ensure this, pages must be accessible
to kernel threads in an atomic and lockless manner. This could be
achieved by using a per-CPU supply of pages dedicated to handling
kernel-stack faults.

[1] https://lwn.net/Articles/600644
[2] https://lwn.net/Articles/692608