On Mon, Mar 11, 2024 at 1:09 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > On 3/11/24, Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> wrote: > > This is follow-up to the LSF/MM proposal [1]. Please provide your > > thoughts and comments about dynamic kernel stacks feature. This is a WIP > > has not been tested beside booting on some machines, and running LKDTM > > thread exhaust tests. The series also lacks selftests, and > > documentations. > > > > This feature allows to grow kernel stack dynamically, from 4KiB and up > > to the THREAD_SIZE. The intend is to save memory on fleet machines. From > > the initial experiments it shows to save on average 70-75% of the kernel > > stack memory. > > > Hi Mateusz, > Can you please elaborate how this works? I have trouble figuring it > out from cursory reading of the patchset and commit messages, that > aside I would argue this should have been explained in the cover > letter. Sure, I answered your questions below. > For example, say a thread takes a bunch of random locks (most notably > spinlocks) and/or disables preemption, then pushes some stuff onto the > stack which now faults. That is to say the fault can happen in rather > arbitrary context. > > If any of the conditions described below are prevented in the first > place it really needs to be described how. > > That said, from top of my head: > 1. what about faults when the thread holds a bunch of arbitrary locks > or has preemption disabled? is the allocation lockless? Each thread has a stack with 4 pages. Pre-allocated page: This page is always allocated and mapped at thread creation. Dynamic pages (3): These pages are mapped dynamically upon stack faults. A per-CPU data structure holds 3 dynamic pages for each CPU. These pages are used to handle stack faults occurring when a running thread faults (even within interrupt-disabled contexts). Typically, only one page is needed, but in the rare case where the thread accesses beyond that, we might use up to all three pages in a single fault. This structure allows for atomic handling of stack faults, preventing conflicts from other processes. Additionally, the thread's 16K-aligned virtual address (VA) and guaranteed pre-allocated page means no page table allocation is required during the fault. When a thread leaves the CPU in normal kernel mode, we check a flag to see if it has experienced stack faults. If so, we charge the thread for the new stack pages and refill the per-CPU data structure with any missing pages. > 2. what happens if there is no memory from which to map extra pages in > the first place? you may be in position where you can't go off cpu When the per-CPU data structure cannot be refilled, and a new thread faults, we issue a message indicating a critical stack fault. This triggers a system-wide panic similar to a guard page access violation Pasha