On Mon, Mar 06, 2023 at 03:57:30PM -0800, Mike Kravetz wrote: > One of our product teams recently experienced 'memory bloat' in their > environment. The application in this environment is the JVM which > creates hundreds of threads. Threads are ultimately created via > pthread_create which also creates the thread stacks. pthread attributes > are modified so that stacks are 2MB in size. It just so happens that > due to allocation patterns, all their stacks are at 2MB boundaries. The > system has THP always set, so a huge page is allocated at the first > (write) fault when libpthread initializes the stack. > > It would seem that this is expected behavior. If you set THP always, > you may get huge pages anywhere. > > However, I can't help but think that backing stacks with huge pages by > default may not be the right thing to do. Stacks by their very nature > grow in somewhat unpredictable ways over time. Using a large virtual > space so that memory is allocated as needed is the desired behavior. > > The only way to address their 'memory bloat' via thread stacks today is > by switching THP to madvise. > > Just wondering if there is anything better or more selective that can be > done? Does it make sense to have THP backed stacks by default? If not, > who would be best at disabling? A couple thoughts: > - The kernel could disable huge pages on stacks. libpthread/glibc pass > the unused flag MAP_STACK. We could key off this and disable huge pages. > However, I'm sure there is somebody somewhere today that is getting better > performance because they have huge pages backing their stacks. > - We could push this to glibc/libpthreads and have them use > MADV_NOHUGEPAGE on thread stacks. However, this also has the potential > of regressing performance if somebody somewhere is getting better > performance due to huge pages. Yes it seems it's always not safe to change a default behavior to me. For stack I really can't tell why it must be different here. I assume the problem is the wasted space and it exaggerates easily with N-threads. But IIUC it'll be the same as thp to normal memories iiuc, e.g., there can be a per-thread mmap() of 2MB even if only 4K is used each, then if such mmap() is populated by THP for each thread there'll also be a huge waste. > - Other thoughts? > > Perhaps this is just expected behavior of THP always which is unfortunate > in this situation. I would think it's proper the app explicitly choose what it wants if possible, and we do have the interfaces. Then, would pthread_attr_getstack() plus MADV_NOHUGEPAGE work, which to be applied from the JVM framework level? Thanks, -- Peter Xu