> > Hate to even bring this up, but there are complaints today about 'allocation > > time' of 1GB pages from the hugetlb pool. This 'allocation time' is actually > > the time it takes to clear/zero 1G of memory. Only reason I mention is > > using something like CMA to allocate 1G pages (at fault time) may add > > unacceptable latency. > > One solution I had in mind is that you could zero these 1GB pages at free > time in a worker thread, so that you do not pay the penalty at page allocation > time. But it would not work if the allocation comes right after a page is > freed. In addition, there were several proposals to speed zeroing of huge pages: 1. X86 specific: Cannon Matthews proposed "clear 1G pages with streaming stores on x86" change. https://lore.kernel.org/linux-mm/20200307010353.172991-1-cannonmatthews@xxxxxxxxxx This speeds up setting up 1G pages by roughly 4 times. 2. X86 specific: Kirill and Andi proposed also proposed a similar change even earlier: https://lore.kernel.org/all/1345470757-12005-1-git-send-email-kirill.shutemov@xxxxxxxxxxxxxxx 3. Arch Generic: Ktasks https://lwn.net/Articles/770826 That allows zeroing HugeTLB pages in Parallel. 4. VM Specific: https://lwn.net/Articles/931933/ Allows to lazyly zero 1G pages in the guest. I looked through the (1) proposal and did not see any major pushbacks, I do not see why movnti can't be used specifically for gigantic pages. Pasha