... > - exit_to_user_mode(): Unmap the extra three pages and return them to > the per-CPU cache. This function is called late in the kernel exit > path. Why bother? The number of tasks running in user_mode is limited to the number of cpu. So the most you save is a few pages per cpu. Plausibly a context switch from an interrupt (eg timer tick) could suspend a task without saving anything on its kernel stack. But how common is that in reality? In a well behaved system most user threads will be sleeping on some event - so with an active kernel stack. I can also imagine that something like sys_epoll() actually sleeps with not (that much) stack allocated. But the calls into all the drivers to check the status could easily go into another page. You really wouldn't to keep allocating and deallocating physical pages (which I'm sure has TLB flushing costs) all the time for those processes. Perhaps a 'garbage collection' activity that reclaims stack pages from processes that have been asleep 'for a while' or haven't used a lot of stack recently (if hw 'page accessed' bit can be used) might make more sense. Have you done any instrumentation to see which system calls are actually using more than (say) 8k of stack? And how often the user threads that make those calls do so? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)