On Thu, Feb 22, 2024, 17:04 Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> wrote: > > For a long time, an 8K kernel stack was large enough. However, since > 2014, the default stack size has increased to 16K [1]. To conserve > memory at Google, we maintained 8K stacks via a custom patch while > verifying that our workload could fit within this limit. > > As we qualify new workloads and kernels, we find it more difficult to > keep the stacks at 8K. Therefore, we will increase the stack size to > the mainline value of 16K. However, this translates to a significant > increase in memory usage, potentially counted in petabytes. > > With virtually mapped stacks [2], it's possible to implement > auto-growth on faults. Ideally, the vast majority of kernel threads > could fit into 4K or 8K stacks, with only a small number requiring > deeper stacks that would expand as needed. > > The complication is that new pages must always be available from > within an interrupt context. To ensure this, pages must be accessible > to kernel threads in an atomic and lockless manner. This could be > achieved by using a per-CPU supply of pages dedicated to handling > kernel-stack faults. > > [1] https://lwn.net/Articles/600644 > [2] https://lwn.net/Articles/692608 Hi Pasha, I wonder if this is another potential use case for bringing back cleancache, as proposed in [1]? The idea would be that all kernel stacks have 16KB allocations but only one page accessible and the rest available as cleancache. We can handle a fault on one of those pages by discarding the cleancache page and remapping it as r/w. Peter [1] https://lore.kernel.org/all/ZdSMbjGf2Fj98diT@raptor/