On Mon, Mar 18, 2024 at 11:39 AM David Laight <David.Laight@xxxxxxxxxx> wrote: > > ... > > - exit_to_user_mode(): Unmap the extra three pages and return them to > > the per-CPU cache. This function is called late in the kernel exit > > path. > > Why bother? > The number of tasks running in user_mode is limited to the number > of cpu. So the most you save is a few pages per cpu. > > Plausibly a context switch from an interrupt (eg timer tick) > could suspend a task without saving anything on its kernel stack. > But how common is that in reality? > In a well behaved system most user threads will be sleeping on > some event - so with an active kernel stack. > > I can also imagine that something like sys_epoll() actually > sleeps with not (that much) stack allocated. > But the calls into all the drivers to check the status > could easily go into another page. > You really wouldn't to keep allocating and deallocating > physical pages (which I'm sure has TLB flushing costs) > all the time for those processes. > > Perhaps a 'garbage collection' activity that reclaims stack > pages from processes that have been asleep 'for a while' or > haven't used a lot of stack recently (if hw 'page accessed' > bit can be used) might make more sense. > > Have you done any instrumentation to see which system calls > are actually using more than (say) 8k of stack? > And how often the user threads that make those calls do so? None of our syscalls, AFAIK. Pasha > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)