Re: [LSF/MM/BPF TOPIC] Dynamic Growth of Kernel Stacks

Peter Collingbourne <pcc@xxxxxxxxxx> · Fri, 23 Feb 2024 13:49:08 -0800

On Thu, Feb 22, 2024, 17:04 Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> wrote:
>
> For a long time, an 8K kernel stack was large enough. However, since
> 2014, the default stack size has increased to 16K [1]. To conserve
> memory at Google, we maintained 8K stacks via a custom patch while
> verifying that our workload could fit within this limit.
>
> As we qualify new workloads and kernels, we find it more difficult to
> keep the stacks at 8K. Therefore, we will increase the stack size to
> the mainline value of 16K. However, this translates to a significant
> increase in memory usage, potentially counted in petabytes.
>
> With virtually mapped stacks [2], it's possible to implement
> auto-growth on faults. Ideally, the vast majority of kernel threads
> could fit into 4K or 8K stacks, with only a small number requiring
> deeper stacks that would expand as needed.
>
> The complication is that new pages must always be available from
> within an interrupt context. To ensure this, pages must be accessible
> to kernel threads in an atomic and lockless manner. This could be
> achieved by using a per-CPU supply of pages dedicated to handling
> kernel-stack faults.
>
> [1] https://lwn.net/Articles/600644
> [2] https://lwn.net/Articles/692608

Hi Pasha,

I wonder if this is another potential use case for bringing back
cleancache, as proposed in [1]? The idea would be that all kernel
stacks have 16KB allocations but only one page accessible and the rest
available as cleancache. We can handle a fault on one of those pages
by discarding the cleancache page and remapping it as r/w.

Peter

[1] https://lore.kernel.org/all/ZdSMbjGf2Fj98diT@raptor/