> On Mar 20, 2023, at 05:12, David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 17.03.23 19:46, Mike Kravetz wrote: >> On 03/17/23 17:52, Matthew Wilcox wrote: >>> On Mon, Mar 06, 2023 at 03:57:30PM -0800, Mike Kravetz wrote: >>>> One of our product teams recently experienced 'memory bloat' in their >>>> environment. The application in this environment is the JVM which >>>> creates hundreds of threads. Threads are ultimately created via >>>> pthread_create which also creates the thread stacks. pthread attributes >>>> are modified so that stacks are 2MB in size. It just so happens that >>>> due to allocation patterns, all their stacks are at 2MB boundaries. The >>>> system has THP always set, so a huge page is allocated at the first >>>> (write) fault when libpthread initializes the stack. >>> >>> Do you happen to have an strace (or similar) so we can understand what >>> the application is doing? >>> >>> My understanding is that for a normal app (like, say, 'cat'), we'll >>> allow up to an 8MB stack, but we only create a VMA that is 4kB in size >>> and set the VM_GROWSDOWN flag on it (to allow it to magically grow). >>> Therefore we won't create a 2MB page because the VMA is too small. >>> >>> It sounds like the pthread library is maybe creating a 2MB stack as >>> a 2MB VMA, and that's why we're seeing this behaviour? >> Yes, pthread stacks create a VMA equal to stack size which is different >> than 'main thread' stack. The 2MB size for pthread stacks created by >> JVM is actually them explicitly requesting the size (8MB default). >> We have a good understanding of what is happening. Behavior actually >> changed a bit with glibc versions in OL7 vs OL8. Do note that THP usage >> is somewhat out of the control of an application IF they rely on >> glibc/pthread to allocate stacks. Only way for application to make sure >> pthread stacks do not use THP would be for them to allocate themselves. >> Then, they would need to set up the guard page themselves. They would >> also need to monitor the status of all threads to determine when stacks >> could be deleted. A bunch of extra code that glibc/pthread already does >> for free. >> Oracle glibc team is also involved, and it 'looks' like they may have >> upstream buy in to add a flag to explicitly enable or disable hugepages >> on pthread stacks. >> It seems like concensus from mm community is that we should not >> treat stacks any differently than any other mappings WRT THP. That is >> OK, just wanted to throw it out there. > > I wonder if this might we one of the cases where we don't want to allocate a THP on first access to fill holes we don't know if they are all going to get used. But we might want to let khugepaged place a THP if all PTEs are already populated. Hm. > > -- > Thanks, > > David / dhildenb Unless we do decide to start honoring MAP_STACK, we would be setting an interesting precedent here in that stacks would be the only THP allocation that would be denied a large page until it first proved it was actually going to use all the individual PAGESIZE pages comprising one. Should mapping a text page using a THP be likewise deferred until each PAGESIZE page comprising it had been accessed? Given the main questions of: 1) How to know whether it's a stack allocation 2) How to determine whether the app is consciously trying to allocate the stack via a THP or if it just happened to win the address alignment/size lottery 3) Whether to honor the THP allocation in either case It seems taking the khugepaged approach would require Yet Another Flag to provide a way for an application that KNOWS a THP-mapped stack would be useful to get it without having to incorporate a loop to touch a byte in every PAGESIZE page in their allocated aligned stack and hope it gets its upgrade. William Kucharski